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Abstract 

This paper analyses the cognitive processes in a widely used, non-verbal test of 
analytic intelligence, the Raven Progressive Matrices Test (Raven, 1962), The analysis 
determines which processes distinguish between higher-scoring and lower-scoring subjects and 
which processes are common to all subjects and all items on the test. The analysis is 
based on detailed performance characteristics such as verbal protocols, eye fixation patterns 
and errors, The theory is expressed as a pair of computer simulation models that perform 
like the median or best college students in the sample. 

The processing characteristic that is common to all subjects is an incremental, re- 
iterative strategy for encoding and inducing the regularities in each problem. The processes 
that distinguish among individuals are primarily the ability to indu^^^ abstract relations and 
the ability to dynamically manage a large set of problem-solving goals in working memory. 
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This paper analyzes a form of thinking that is prototypical of what psychologists 
consider to be analytic intelligence. We will use the term "analytic intelligence" to refer to 
the ability to reason and solve problems involving new information » without relying 
extensively on an explicit base of declarative knowledge derived from either schooling or 
previous experience. In the theory of R. Cattell (1963). this form of intelligence has been 
labeled fluid intelligence and has been contrasted with crystallized intelligence, which more 
directly reflects the previously acquired knowledge and skills that have been crystallized with 
experience. Thus» analytic intelligence refers to the ability to deal with novelty, to adapt 
one's thinking to a new cognitive problem. In this paper, we provide a theoretical account 
of what it means to perform well on a classic test of analytic intelligence, the Raven 
Progressive Matrices test (Raven. 1962). 

This paper describes a detailed theoretical model of the processes in solving the 
Raven test, contrasting the performance of college students who are less successfiil in 
solving the problems to those who are more successful. The model is based on multiple 
dependent measures, including verbal reports, eye fixations and patterns of errors on 
different types of problems. The experimental investigations led to the development of 
computer simulation models that test the sufficiency of our analysis. Two computer 
simulations. FAIRAVEN and BETTERAVEN. express the differences between good and 
extremely good performance on the te3t. FAIRAVEN performs like the median college 
student in our sample: BETTERAVEN performs like one of the very best. BETTERAVEN 
differs from FAIRAVEN in two major ways. BETTERAVEN has the ability to induce 
more abstract relations than FAIRAVEN. In addition. BETTERAVEN has the ability to 
manage a larger set of goals in working memory and hence can solve more complex 
problems. The two models and the contrast between them specify the nature of the 
analytic intelligence required to perform the test and the nature of individual differences in 
this type of intelligence. 

There are several reasons why the Raven test provides an appropriate test bed to 
analyze analytic intelligence. First, the size and stability of the individual differences that 
the test olicits. even among college students, suggest that the underlying differences in 
cognitive processes are susceptible to cognitive analysis. Second, the relatively large 
number of items on the test (36 problems) permits an adiiquate data base for the 
theoretical and experimental analyses of the problem-solving behavior. Third, the visual 
format of the problems makes it possible to exploit the fine-grained, process-tracing 
methodology afforded by eye fixation studies (Just & Carpenter, 1976). Finally, the 
correlation between Raven test scores and measures of intellectual achievement suggests 
that the underlying processes may be general, rather than specific to this one test (Court & 
Raven. 1982), although like most correlations, this one must be interpreted with caution. 

The Raven test, including the simpler Standard Progressive Matrices Test and the 
Coloured Progressive Matrices Test, is also widely used in both research anc clinical 
settings. The test is used extensively by the military in several western countries (for 
example, see Eelmont & Marolla. 1973). Also, because of its non-verbal format, it is a 
common research tool used with children, the eldeily. and patient populations for whom it 
is desirable to minimize the processing of language. The wide usage means that there is a 
great deal of information about the performance profiles of various populations. But more 
importantly, it means that a cognitive analysis of the processes and structures that underlie 
performance has potential practical importance in the domains in which the test is used 
either for research or classification. 

Several different research approaches have converged on the conclusion that the Raven 
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test measures processes that are central to analytic intelligence. Individual differences in 
the Raven correlate highly with those found in other complex, cognitive tests (see Jensen, 
1987). The centrality of the Raven among psychometric tests is graphically illustrated in 
several nonmetric scaling studies that examined the interrelatioK. among ability test scores 
obtained both from archival sources and more recently collected data (Snow, Kyllonen & 
Marshalek. 1984). The scaling solutions for the different data bases showed remarkably 
similar patterns. The Raven and other complex reasoning tests were at the center of the 
solution. Simpler tests were located towards the periphery and they clustered according to 
their content, as shown in Figure la. This particular scaling analysis is based on the 
results from a vmety of cognitive tests given to 241 high school students (Marshalek, 
Lohman & Snow. 1983). Snow et al. constructed an idealized space to summarize the 
results of their numerous scaling solutions, in which they placed the Raven test at the 
center, as shown in Figure lb. In this idealized solution, task complexity is maximal near 
the center and decreases outward toward the periphery. The tests in the annnlus 
surroundmg the Raven test involve abstract reasoning, induction of relations, and deduction. 
For tests of intermediate or low complexity only, there is a clustering as a function of the 
test content, with separate clusters for verbal, numerical and spatial tests. By contrast, the 
more complex tests of reasoning at the center of the space were highly inter correlated in 
spite of differences in specific content. 



Insert Figure la and lb - Marshalek et al results 

One of the sources of the Raven test's centrality, according to Marshalek, Lohman 
and Snow was that "... more complex tasks may require mde involvement of executive 
assembly ^nd control processes that structure and analyze the problem, assemble a strategy 
of attack on it. monitor the performance process, and adapt these strategies as performance 
proceeds..." (1983, p. 124). This theoretical interpretation is based on the outcome of the 
scaling studies. Our research also converges on the importance of executive processes, but 
the conclusions are derived from a process analysis of the Raven test. 

Although there has been some dispute among psychometricians about which tests in 
the larger space might be said to reflect analytic intelligence, the Raven test is central with 
respect to either interpretation. In one view, intelligence refers to a construct underlying a 
small range of tests, in particular, those at the center of the space. This view is 
associated with Spearman, although Spearman himself avoided the term "intelligence" and 
instead used the term g to refer to the determinants of shared variance among tests of 
intellectual ability (Jensen, 1987; Spearman, 1927). An alternative view, associated with 
Thurstone, applies the term "intelligence" to a large set of diverse menta! abilities, 
including quite domain specific abilities, such as those in the periphery of the space 
(Thurstone, 1938). Although the two views differ in the size of the spaces which they 
associate with intelligence, the centrality of the Raven test emerges in either case. The 
centrality of the Raven test indicates not only that it is a good measure of intelligence, but 
also that a theory of the processing in the Raven test should account for a good deal of 
the reasoning in the other tests in the center of the space as well. 

This paper has four parts. Part I describes the structure of the problems, focusing 
on the problem characteristics that are likely to tax the psychological processes. Part I also 
reports two studies that examine the processes empirically, determining which processes 
distinguish between high scoring subjects and lowfi-scoring subjects and which processes are 
common to all subjects in their attempts to solve all problems. Part II describes the two 
simulation models that perform like the median subject or like the best subject. Part III 
compares the performance of the human subjects and the theoretical models in detail. Part 
IV generalizes the theory and examines its implications for a theory of intelligence. 
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Figure In. A nonmetric scaling of the intercorrelations among various ability tests, showing 
the centrality of the Raven (from Marshalek, Lehman & Snow, 1983, Figure 2, p. 
122). The tests near the center of the space, such as the ftaven and Letter Series 
Tests, ate the most complex and share variance despite their differences in content 
(figura! versus verbal). ITie outwardly radiating concentric circles indicate ,'ecreasing 
levels of test complexity. The shapes of the plotted points also denote test 
complexity: squares (most complex), triangles (intermediate complexity), and circles 
(least complex). The shading of the plotted points indicates the content of the test: 
white (figural), black (verbal) and dotted (numerical). (Reprinted by permission of 
authors.) 
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Figure lb. An idealized scaling solution that summarizes the relations among ability tests 
across several sets of data, illustrating^the centrality of the Raven test (from Snow, 
K>llonen & Marshalek, 1984; Figure 2.^^. 92). The outwardly radiating concentric 
circles indicate decreasing levels of test ^taplexity. Tests involving different content 
(figuraK verbal, and numerical) are sepan^ted by dashed radial lines. (Reprinted by 
permission of authors and publisher). 



PART I: PROBLEM STRUCTURE AND HUMAN PERFORMANCE 



A task analysis of the Raven Progressive Matrices Test suggests some of the 
cognitive processes that are likely to be implicated in soKnng the problems. The test 
consists of a set of visual analogy problems. Each problem consists of a 3 x 3 matrix, in 
which the bottom right entry is missing and must be selected from an* g eight response 
alternatives arranged below the matrix. (Note that the word entry refe to each of the 
nine cells of the matrix). Each entry typically contains < le to five fig^ural elements, such 
as geometric figures, lines, or background textures. The test instructions tell the test-taker 
to look across the rows and then look down the columns to determine the rules and then 
to use the rules to determine the missing entry. The problem in Figure 2 illustrates the 
format.^ 



Insert Figure 2 -sample problem 

The variation among the entries in a row and column of this problem can be 
described by three rules: 

- Rule A. Each row contains three geometric figures (a diamond, a triangle and a 
square) distributed across its three entries. 

- Rule B. Each row contains three textured lines (dark, striped and clear) 
distributed across its three entries, 

- Rule C. The orientation of the lines is constant within a row, but varies between 
rows (vertical, horizontal, then oblique). 

The missing entry can be gvjnerated from these rules. Rule A specifies that the 
answer should contain a square (since the first two columns of the third row contain a 
triangle and diamond). Rule B specifies it should contain a dark line. Rule C specifies that 
the line orientation should be oblique, from upper left to lower right. These rules converge 
on the correct response alternative, #5, Some of the incorrect response alternatives are 
designed to satisfy an incomplete set of rules. For example, if a subject induced Rule A 
but not B or C he might choose alternative #2 or #8. Similarly, inducing Rule B but 
omitting A and C leads to alternative #3. This sample problem illustrates the general 
structure of the test problems, but corresponds to one of the easiest problems in the test. 
The more difficult problems entail more rules or more difficult rules, and more figural 
elements per entry. 

Our research focuses on a form of the Raven test that is widely used for adults of 
higher ability, the Raven Advanced Progressive Matrices, Sets I and li. Set I, consisting 
of 12 problems, is often used as a practice test or to obtain a rough estimate of a 
subject's abil'ty. It has been pointed out that the first several problems in Set I can be 
solved by perceptually-based algorithms such as line continuation (Hunt, 1974). However, 
the later problems in Set I and most of the 36 problems comprising Set H which our 
resf'arch exan^.ines cannot be solved by perceptually-based algorithms, as Hunt noted. Like 
the sample problem in Figure 2, the more difficult problems require that subjects analyze 
the variation in the problem in order to induce the rules that generate the correct solution, 
The problems requiring an analytic strategy can be used to discriminate among individuals 
with higher education, such as college students (Raven, 1965). 
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Figure 2. A problem to illustrate the format of the Raven items. The variation among 
the three geor.ietric forms (diamond, square, triangle} and three textures of the line 
(dark, striped, clear) is each governed by a distribution-of-three-values rule. The 
orientation of the line is governed by a constant in a row rule. (The correct answer 
is 1(6). 



ERIC 



11 



5 



Problem difficulty. Although all of the Raven problems share a similar format, there 
is substantial variation among them in their difficulty. The magnitude of the variation is 
apparent from the error rates (shown in Figure 3) of 2256 British adults, including 
telephone engineering applicants, students at a teacher training college and British Royal 
Air Force recruits (Forbes. 1964). There is an almost monotonic increase in difficulty from 
the initial problems, which have negligible error rates, to the last few problems, which have 
extremely high error rates. (The error rates on the final problems reflect failures to 
attempt these problems in the testing period as well as failures to solve them correctly). 
The considerable range of error rates among problems leads to the question of what 
psychciogical processes account for the differencts in problem difficulty and for the 
differences among people in their ability to solve them. 



Insert Figure 3 - Forbes data 

The test s origins provide a clue to what the test was intended to measure. The 
Raven Progressive Matrices test was developed by John Raven, a student of Spearman. As 
we previously mentioned. Spearman (1927) believed that there was one central intellectual 
ability (which he referred to as g). as well as numerous specific abilities. What g consisted 
of was never precisely defined, but it was thought to involve "the eduction of relations". 
John Raven's conception of what his progressive matrices test measured was somewhat 
more articulated. His personal notes, generously made available to us by his son, J. Raven, 
indicate that he wanted to develop a series of overlapping, homogeneous problems whose 
solutions required different abilities. However, the descriptions of the abilities that Raven 
intended to measure are primarily characteristics of the problems, and not specifications of 
the requisite cognitive processes. John Raven constructed problems that focused cn each of 
six different problem characteristics, which approximately correspond to the different types 
of rules that we describe below. He used his intuition and clinical experience to rank order 
the difficulty of the six problem types. Many years later, normative data from Forbes, 
shown in Figure 3, became the basis for selecting problems for retention in newer versions 
of the test, and for arranging the problems in order of increasing difficulty, without regard 
to any underlying processing theory. Thus, the version of the test that is examined in this 
research is an amalgam of John Raven's implicit theory of the components of reasoning 
ability and subsequent item selection ^nd ordering done on an actuarial basis. 

riule taxonomy 

Across the Raven problems that we have examined, we have found that five different 
types of rules govern the variation among the entries. Many problems involve multiple 
rules, which may all be different rule types or several instances or tokens of the same type 
of rule. The problems in Figures 2, 4a. 4b and 4c illustrate the five different tjrpes of 
rules that are described in Table 1. Almost all of the Raven problems in Sets I and II 
can be classified with respect to which of these rule types govern its variation, as shown in 
Appendix A.^ 



Insert Table 1. Figure 4a. b. c 

One qualification to this analysis is that sometimes the set of rules describing the 
variation in a problem is not unique. For example, quantitative pairwise progression is 
often interchangeable with a distribution-of-three-values. Consider a row consisting of an 
arrow pointing to twelve o'clock* four o'clock, and eight o'clock. This variation can be 
described as a distribution-of-three-values or in terms of a quantitative progression in which 
the arrow's orientation is progressively rotated 120 degrees clockwise, beginning at twelve 

ErJc 1 2 
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Figure 3. The percentage error for each problem in Set II of the Raven Advanced 
Progressive Matrices shows the large variation in difficulty among problems with 
very similar formats. The data are from 2256 British adults, including telephone 
engineering applicants, students at a teacher training college, and British Royal Air 
Force recruits (Forbes. 1964). 



Table 1: A Taxonomy of Rules in the Raven Test 

Constant in a row - the same value occurs throughout a row, but changes down a 
column. {See Figure 4b. where the location of the dark component is constant within each 
row: in the top row. the location is the upper half of the diamond; in the middle row, it is 
the bottom half of the diamond: and in the bottom row, it is both halves). 

Quantitative pairwise progression - a quantitative increment or decrement between 
adjacent entries in an attribute such as size, position, or number. (See Figttre 4a. where the 
number of black squares in each entry increases along a rov/ from 1 to 2 to 3). 

Figure addition or subtraction - a figure from one column is added to (juxtaposed or 
superimposed) or subtract .»d from another figure to produce the third. (See Figure 4b. where 
the figural element in column 1 juxtaposed to the element in column 2 produces the 
element in column 3). 

Distribution- of-three-vahies - three values from a categorical attribute (such as figure 
type) are distributed through a row. (See Figifre 2. where the three geometric forms 
(diamond, square, triangle) follow a distribution rule and the three line textures (black, 
striped, clear) also follow a distribution rule). 

Distribution-of-two-values - two values from a categorical attribute are distributed 
through a row: the third value is null. (See Figure 4c. where the various figural elements 
(such as the vertical line, the horizontal line, and the V in the first row) follow a 
distribution-of-two-values). 
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Figure 4a.^ A problem to illustrate tlie quantitative pairwise progression rule. The number 
of b'ack squares in the top of each row increases by one from the first to the 
second column and from the second to the third column. The number of black 
squares along the left remains constant within a row, but changes between rows 
from three to two to one. (The correct answer is #3». 
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Figure 4b. A problem to Illustrate the figure addition rule. The figural element in the 
first column is superimposed on the figural element in the second column to 
comprise the figural element in the third column. The position of the dairkened 
element remains constant in a row, but changes between rows from top to bottom 
to both. (The correct answer is #8). 
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Figure 4c. A problem to illustrate the dtstribution-of-two-values rule. Each flgural 

element, such as the horizontal line, the vertical line, the V. and so on. occurs twice 
in a row and the third value is null. (The correct answer #51. 
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o'clock. Similarly, the variation described by a distribution-of-two-values rule may be 
alternatively described by a figure-addition-modulo-2 rule. In the case of alternative rules, 
Appendix A lists the rules most often mentioned by the highest scoring subjects in 
Experiment la.^ 

Finding corresponding elements. In problems with multiple rules, the problem solver 
must determine which figural elements or attributes in the three entries in a row are 
governed by the same rule, a process that will be called correspondence finding. For 
example, given a shaded square in one entry, the problem solver might have to decide 
which figure in another entiy. either a shaded triangle or an unshaded square, is governed 
by the same rule. Do the squares correspond to each other, or do the shaded figures? In 
this example, and in some of the Raven problems, the cues to the correspondence are 
ambiguous, making it difficult to tell a priori which figural elements correspond to each 
other. The correspondence finding process is a subtle source of difficulty because many 
problems seem to have been constructed by conjoining the figural elements governed by 
se/eral rules, without much rege d for the possible difficulty of conceptually segmenting the 
conjunction. 

The difficulty in correspondence finding can be illustrated with an adaptation of one of 
the problems (#28. Set II). shown in Figure 5. A first plausible hypothesis about the 
correspondences is that the rectangles are governed by one rule, the dark curves by another 
rule, and the straight lines by a third rule. This hypothesis reflects the use of a 
matclwtg'Uames heuristic, namely, that figures with the same name might correspond to 
each other. If this hypothesis is pursued further, it becomes clear that although each row 
does contain two instances of each figure type, the number and orientation of the figures 
vary un systematically. The matching-names heuristic produces an unfruitful hypothesis 
about the correspondences in this problem. A subject who has tried to apply the heuristic 
must backtrack and consider other correspondences based on some other feature, either 
number or orientation. Number, like figure identity, does not result in any economical and 
complete rule that governs location or orientation. Orientation, the remaining attribute, is 
the basis for two economical, complete rules. The horizontal elements in each row can be 
described in terms of two distribution-of-three-values rules, one governing number (1- 2 and 
3 elements) and the second governing figure type (line, curve and rectangle). Similarly, the 
vertical elements in each row are governed by the same two rules. This example illustrates 
the complexity of correspondence finding, which along with the type of rule in a problem 
and the number of rules, can contribute to the difficulty of a problem. 



Insert Figure 5 - correspondence problem 

In addition to variation among problems in the difficulty of correspondence finding, 
the problems also vary in the number of rules. Although John Raven intended to evaluate 
a test taker's ability to induce relations, he apparently tried to make the induction process 
more difficult in some problems by including more examples or tokens of rules. A major 
claim of the current analysis is that the presence of a larger number ot rule tokens taxes 
not so much the processes that induce the rules, but the goal management processes that 
are required to construct, execute and maintain a mental plan of action during the solution 
of those problems containing multiple rule tokens as well as difficult correspondence finding. 
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Figure 5. A problem to illustrate characteristics that make it difficult to determine which 
figural elements correspond, that is. which are operated on by the same rule. 
Subjects initially assume that the rectangles correspond to each other, the dark 
curves correspond to each other, and the straight lines correspond to each other. 
But to solve the problem, subjects must backtrack and try other possible bases for 
correspondence, such as numerosity or orientation. Orientation turns out to provide 
the correct basis. The horizontal figures correspond to each other; their form 
(rectangle, dark curve, straight line) and number (1, 3. 2) is governed by distribution- 
of-three-values rules. Similarly, the vertical figures correspond to each other; their 
form and number is also governed by distribution-of-three-values rules. (The correct 
answer is #5). - 
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Experiment 1: Performance in the Raven test 

The purpose of Experiment 1 was to collect more detailed data about the performi^nce 
in the Raven test to reveal more about the process and the content of thought during the 
solving of each Raven problem. Experiments la and lb examined Raven test performance 
while obtaining somewhat different measures of performance. Three types of measures 
provide the basis for the quantitative evaluation of the theory. 

The first measure is the frequency and pattern of errors, which were obtained in both 
Experiments la and lb. The singulation models account not only for the number of errors 
that a person of a given ability will make, but also predict which types of problems he will 
fail to solve. 

The second type o? measure, obtained in Experiment la reflects on-line processes 
used during problem solution. One such online measure assessed how the entries in 
successive rows were visually examined. In particular, measures of the eye-fixation patterns 
assessed the number of times a subject scanned a row of entries and the nunber of times 
h<j looked back and forth (made paired comparisons) between entries. Another on-line 
measure was the time between the successive statements of rules uttered by subjects who 
were talking aloud while solving the problems. These on-line measures constrain the type 
of solution processes postulated in the simulations. 

A third measure, obtained in Experiment lb. is the subjects' descriptions of the rules 
that they induced in choosing a response to each problem. The subjects' rules are 
compared to the rules induced by the simulation models. 

Method 

Procedure for Experiment la. In Experiment la. the subjects were presented with 
problems from the Raven test while their eye fixations were recorded. They were asked to 
talk out loud while they solved the problems, describing what they noticed and what 
hypotheses they were entertaining. The subjects were given the standard psychometric 
instructions and shown two simple practice problems. One deviation from standard 
psychometric procedure was that subjects were told to pace themselves so as to a^c ^. . all 
of the problems in the standard 40 minute time limit. 

Stimuli. Experiment la used 34 of the 48 problems in Sets I and II that could be 
represented and displayed within the raster graphics resolution of our display system, which 
was 512 X 512 pixels (see Just & Carpenter, 1979, for a description of the video 
digitization and display characteristics). The stimuli were created by digitizing the video 
image of each problem in the Raven test booklet. Appendix A shows the sequence number 
in the Raven test of the problems that were retained. The problems that could not be 
adequately digitized were those with very high spatial frequencies in their depiction, such as 
small grids or cross-hatching (Set II #2. 11, 15. 20, 21, 24. 25. 28, 30). There was little 
relation between the presence of high spatial frequencies and z problem's difficulty as 
indicated by the normative error rate from Forbes (1964) shovn in Figure 3. 

Eye fi.xation$. The subjects' eye fixations were monitov^'d remotely with an Applied 
Science Laboratories corneal and pupil-centered eye-tracker that sampled at 60 Hz. 
ultimately resulting in an x-y pair of gaze coordinates expressed in the coordinate system of 
the display. The individual x-y coordinates were later aggregated into fixations. Then, 
successive fixations on the same one of the nine entries in the problem matrix or on a 
single response alternative were aggregated together into units called gazes, which constitute 
the main eye-fixation data base. 

Procedure for Experiment lb. Unlike Experiment la. in which subjects gave verbal 
protocols while they solved each problem, in Experiment lb subjects were asked to work 
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silently, make their response, and then describe the rules that motivated their final 
response. This change in procedure was intended to provide more complete information 
about what rules the subjects induced. These rule statements were then compared to the 
rules induced by FAIRAVEN and BETTERAVEN. Subjects were given 40 problems, 
approximately half of v'hich were from the Raven Progressive Matrices test and half from 
the Standard Progressive Matrices Test, involving similar rule types, to increase the number 
of problems involving more difficult rules. The subjects in Experiment lb were tested in 
two sessions separated by about a week with 20 items in each session. 

Subjects. In Experiment la, the subjects were 12 Carnegie Mellon students who 
participated for course credit. In Experiment lb, the subjeccs were 22 students from 
Carnegie Mellon and the University of Pittsburgh who participated for a $10 payment. 
Data were not included from three additional subjects who did not return for the second 
session to complete Experiment lb. 

Overview of Results 

Errors, eye fixations and verbal reports. This overview presents the general patterns 
of results, particularly results that influenced the design features of the simulation models. 
This overview, presented in preliminary and qualitative terms, will be followed by a more 
precise analysis of the data in Part III, after the presentation of the models. 

In Experiment la. over all 34 problems, the number of errors per subject ranged 
from 2 to 20, with a mean of 10.5 (31%), and a median of 10.3. Although our college 
student subjects had a lower mean error rate than Forbes' more heterogeneous sample, the 
correlation between the error rates of our sample and Forbes' on the 27 problems in Set II 
was high. r(25) = .91. In Experiment lb, the mean number of errors for the 40 Raven 
problems was 11.1 (28%). with a median of 10 errors.** 

The error rate on a given problem was related to the types of rules it involved, and 
the number of tokens of each rule type. A simple linear regression whose single 
independent variable was the total number of rules in a problem (irrespective of whether 
they were of similar or different types, and not counting any constant rules) accounted for 
57% of the variance among the mean error rates in Experiment la for the 32 problems 
classified v;ithin our taxonomy. (If any constant rules are counted in with th.. number of 
rule tokens in a problem, then the percentage of variance accounted for declines to 45%). 
The median and mean response times for correct responses were generally longer for the 
problems that had higher error rates (with a correlation of .87 between the mean times and 
the errors) suggesting that problem difficulty affected both performance measures. 

Perhaps the most striking facet of the eye fixations and verbal protocols was the 
demonstrably incremental nature of the processing. The way that the subjects solved a 
problem was to decompose it into successively smaller subproblems, and then proceed to 
solve each subproblem. The induction of the rules was incremental, in two respects. First 
of all, in problems containing more than one rule, the rules were described one at a time, 
with long intervals between rule descriptions, suggesting that they were induced one at a 
time. Second, the induction of each rule consisted of many small steps, refiected in the 
pairwise comparison of elements in adjoining entries. These aspects of incremental 
processing were ubiquitous characteristics of the problem-solving of all of the subjects, and 
do not appear to be a source of individual differences. Consequently, the incremental 
processing played a large role in the design of both simulation models. 

A typical protocol from one of the subjects illustrates the incremental processing. 
Table 2 shows the sequence of gazes and verbal comments made by an average subject 
(41% errors) solving a problem involving two distribution-of-three-values rules and a constant 
in a row rule (Set II #1, which is isomorphic to the problem depicted in Figure 2). The 
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subject's comments are transcribed adjacent to the gazes that occurred during the utterance. 
(The subject's actual comments were translated to refer to the isomorphic attributes 
depicted in Figure 2.) The location of each gaze is indicani^d by labeling the rows in the 
matrix from top to bottom as row ), 2 and 3, and the columns from left to right as 1, 2 
and 3, such that (L2) designates the entry in the top row and middle column. The braces 
encompassing a sequence of gazes indicate how the gazes were classified in the e.nalysis 
that counted the number of scans ot rows and columns. The duration of each gaze is 
indicated in milliseconds next to the location of the gaze. 



Insert Fi^re 6 and Table 2 below it 

The verbal report shows that the subject mentioned one attribute at a time, with 
some * ae interval between the mentions, suggesting that the representation of the entries 
was being constructed incrementally. Also, the subject described the rules one at a time, 
typically with several seconds elapsing between rules. The subject seemed to construct a 
complete representation attribute by attribute, and induced the rules one at a time. 

The incremental nature of the process is also apparent in the pattern of gazes, 
particularly the multiple scans of rows and columns and the repeated fixations of pairs of 
related entries. These scans are apparent in the sequence of gazes shown in Figure 6. 
(The numbers indicating the sequence of gazes have been placed in columns to the right of 
the fixated entries and lines have been drawn to connect the successive fixations of entries 
within rows). This protocol indicates the large amount of pairwise and row-wise scanning. 
For example, like most of the eye fixation protocols, this one began with a sequence of 
pairwise gazes on first two entries in the top row. The subject was presumably encoding 
some of the figural elements in these two entries and comparing their attributes. Then, the 
subject went on to compare middle and right-most entries of the top row, followed by 
several scans of the complete row. 

The general results, then, are that the processing is incremental, that the number of 
rule tokens affects the error rates, and that there is a wide range of differences among 
individuals in their performance on this test. 

Experiment 2: Goal management in other tasks 

The finding that error rates increase with the number of rule tokens in a problem 
suggests that the sheer keeping track of figural attributes and rules might be a substantial 
source of individual differences in the Raven test. "Keeping track" refers to the ability to 
generate subgoals in working memory, record the attainment of subgoals, and set new 
subgoals as others are attained. Subjects who are successful at goal management in tne 
Raven test should also perform well on other cognitive tasks involving extensive goal 
management. One such task is a puzzle called the Tower of Hanoi, which can be solved 
using a strategy that requires considerable goal management. Most research on the Tower 
of Hanoi puzzle has focused on how subjects induce a correct strategy. By contrast, in the 
curr^^nt study, the inductive aspect of the puzzle was minimized by teaching subjects a 
^ strategy beforehand, with extensive instructions and practice. Errors on the Tower of 

Hanoi puzzle should correlate with errors on the Raven test, to the extent that both 
require goal management. 

The Tower of Hanoi puzzle consists of three pegs and three or more disks of 
increasing size arranged on one of the pegs in the form of a pyramid, with the largest disk 
on the bottom and smallest disk on the top, as shown in the top part of Figure 7. The 
subject's task is to reconstruct the p3rramid, moving one disk at a time, on another peg 
(called the goal peg), without ever putting a larger disk on a smaller disk. One of the 
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29 (to 3.3) 




40 (to 3.1) 
42 (to 3.2) 
50 (to 1.3) 



17 (to 3.2) 
23 



31 (to 3.2) 
38 (to 3.2) 




49 (to 2.2) 
52 (to 1.2) 
56 



18 (to 1.2) 
27 (to 1.2) 
30 (to 2.3) 
32 
34- 
36- 



-55 
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Figure 6. The sequence of gazes shown in the protocol in Table 2. Gazes within the 
same row are connected. The numbers in parentheses indicate the location of a 
gaze that followed if it was in a different row. 



23 



- Okay, 



Table 2 

The locations and durations of gazes in a typical protocol 
(Subject 5, Raven Set II, problem number 1) 

Gaze No. 



1 Pairvise- 

2 (1,1)-(1,2) 
3 

4 

5 Row 1- 

6 

7 

8 Row 1- 

9 
10 
11 

12 Row 1- 

13 

14 

15 Pairwise- 

16 (1,1)-(1,2) 
17 

18 
19 
20 
21 

22 Row 2- 

23 



Location 


Duration 


(Row, Col) 


(msec) 






1,2 


233 


1,1 


367 


. 1,2 


533 


' 1,1 


117 


1,3 


434 " 


. 1,2 


367 


1,1 


516 


1,2 


400 


. 1,3 


517 


1,2 


550 


1,1 


383 


1,2 


517 


. 1,3 


285 


" 1,2 


' 599 


1,1 


533 


- 1,2 


468 " 


2,3 


284 


3,2 


317 


1,2 


434 


1,1 


533 


2,1 


434 


2,2 


451 


. 2,3 


467 



there's diamond, 
square, triangle 



and they each contain 
lines through them 
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24 
25 

26 Row 3- 

27 
28 

29 Col 2- 

30 
31 

32 Row 3- 

33 

34 

35 

36 Row 3- 

37 

38 

39 

40 P'-^irwise- 

41 (2,2)-(3,l) 
42 

43 Row 3- 

44 
45 
46 
47 

48 Row 3- 

49 



Answers- 



Table 2 (continued) 

2,2 467 

2,1 167 

3.1 233 

3.2 267 
1,2 483 
2,2 599 J 

3.2 300 

2.3 167 
3,2 133 

3.1 650 

3.2 433 

3.3 432 

3.2 167 

3.1 400 

2.3 217 

3.2 583 
2,2 583 

3.1 334 

2.2 267 

3.2 383 

3.1 234 
#7 183 
#4 467 

3.3 217 

3.2 199 
3,1 183 



- with different shadings 



- going from vertical, 
horizontal, oblique 



- and the third one should be 
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50 
51 
52 
53 
54 
55 
56 
57 
58 
59 
60 



Diagonal- 



Row 3- 



Ansvers- 



2,2 
1,3 
V. 3,1 
1,2 
2,1 
3,2 
3,1 

n 

#5 
#1 
#" 



Table 2 (continued) 
350 
150 
433 
234 
117 
417 
366 
250 
1900 
250 



- Okay, it should be a squaie 



- And should have the 



184 



-black line in them 

and tlie answer's 5. 
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most commonly used strategies in the puzzle is called the goal-recursion strategy (Simon. 
1975: Kotovsky. Hayes & Simon. 1986). With this strategy, the puzzle is solved by first 
setting the goal of nioving the largest disk from the bottom of the pyramid on the source 
peg to the goal peg. But before executing that move, the disks constituting the sub- 
pyramid above the lar^, ^t disk must be moved out of the way. This goal is recursive 
because in order to move the sub- pyramid, its largest disk must be cleared, and so on. 
Thus, to execute this strategy, a subject must set up a number of embedded subgoals. As 
the number of dijks in a puzzle increases, the subject must generate a successively larger 
hierarchy of subgoals and remember his/her place in the hierarchy while executing 
successive moves. 

Moves in the Tower of Haroi puzzle can be organized within a goal tree which 
specifies the subgoals that m^st be generated or retained on each move, as pointed out by 
Egan and Greeno (1974). Figure- 7 shows a diagram of the goal tree when the goal 
recursion stra^egy is used in a four-disk problem. Each branch corresponds to a subgoal 
and the terminal nodes correspond to individual moves. The subject can be viewed as 
doing a depth first search of the goal tree: in the goal-recursion strategy, the subject is 
taught to generate the subgoals equivalent to those listed in the left-most branch to enable 
the first move. Subsequent moves entail either maintaining, generating or attaining various 
jubgoals. In particular, on move 1. 5. 9. and 13. the clain is that the subject should 
generate one or more subgoals before executing the move: by contrast, no new subgoals 
need to be generated before other moves. Egan and Greeno (1974) found that likelihood of 
an error on a move increased with the number of goals that had to be maintained or 
generated to enable that move. Consequently, performance on the Tower of Hanoi goal- 
recursion strategy should correlate with performance on the Raven test, to the extent that 
both tasks rely on generating and maintaining goals in working memory. 



Insert Figure 7 - Goal tree and Tower of Hanoi 



Method 

Procedure, The subjects were administered the Raven Progressive Matrices Test. Sets 
I and II. using standard psychometric procedures. Then the subjects were given extensive 
instruction and practice on the goal-recursion strategy in 2-disk and 3-disk versions of the 
Tower a Hanoi puzzle. Finally, all subjects were given Tower of Hanoi problems of 
increasing size, from 3-disks to 8-disks, although several subjects v,ere unable to complete 
the 8-disk puzzlf • d so th2 data analysis concerns only the 3-disk to 7-disk problems. 
The total numbe * moves required to s.^lve a puzzle with N disks using goal recursion is 
2^ -L The staii. and goal pegs for each size puzzle were selected at random from trial to 
trial. Between trials, subjects were reminded to use the goal-recursion strategy and they 
were questioned at the end of all of the tn'^^ to ensure that they had complied. In place 
of a physical Tower of Hanoi, subjects saw a computer-generated (Vaxstation II) graphic 
display, with disks that moved when a source and destination peg were indicated with a 
mouse. Subjects seldom attempted an illegal move (placing a larger disk on a smaller 
disk), but on those few occasions they tried, it was disallowed by the program. If subjects 
made a move that was inconsistent with the goal-recursion strategy (and hence would not 
move them toward the goal), the move was signaled as an error by a computer tone, and 
the subject was instructed to undo the erroneous move before making the next n../e. 
Thus, subjects could not stray more than one move from the optimal solution path. The 
main dependent measure was the total number of errors, that is, moves that were 
inconsistei't with the goal-recursion strategy. 
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Figure 7. The goal tree generated by the goal-recursion strategy for the four-disk Tower- 
of-Hanoi puzzle. The tree is traversed depth-first, from left to right, generating the 
15 moves. 



Subjects, The subjects were 45 students from Carnegie Mellon, the University of 
Pittsburgh, and Community College of Allegheny County who participated for $10 payment. 
They took the Raven Advanced Progressive Matrices test and solved the Tower of Hanoi 
puzzles. 

Results and Discussion 

Becar.se of its extensive dependence on goal management, overall performance of the 
goal-recursion strategy in the Tower of Hanoi puzzle was predicted to correlate highly with 
the Raven test. Consistent with this hypothesis, the correlation between errors on the 
Raven test and total number of errors on the six Tower of Hanoi puzzles was r(43)= .77. 
p < .01, a correlation that is close to the test-retest reliability typically found for the 
Raven test (Court & Raven. 1982). A subanalysis of the higher-scoring subjects was also 
performed because many analyses that follow later in this paper deal primarily with 
students who score in the upper half of our college sample on the Raven test. The 
subanalysis was restricted to subjects whose Raven scores were within one standard 
deviation of the mean Raven score in Experiment la or above, eliminating nine low-scoring 
subjects (scores between 12-17 points on the Raven test).'^ Even with this restricted range, 
the correlation between errors on the Tower of Hanoi puzzles and the Raven test for the 
34 students with scores of 20 or higher was highly significant, r(32) = .57. These 
correlations support the thesis that the execution of the goal-recursion strategy in the 
Tower of Hanoi puzzle and performance on the Raven test are both related to the ability to 
generate and maintain goals in working memory. 

A more specific prediction of the theory is that errors on the Tower of Hanoi puzzle 
should occur on moves that impose a greater burden on working memory and that the 
effect should depend, in part, on the capacity to maintain goals in working memory, as 
assessed by the Raven test. These predictions were supported, as shown in Figure 8. 
Figure 8 shows the probability of an error on moves that require the generation of 0. 1, or 
2 or more subgoals: the four curves are for subjects who are classified according tc their 
Raven test score. As Figure 8 inc" 'a<-es, the error rates were low and comparable for 
moves that did not require the generation of additional subgoals: by contrast, lower-scoring 
subjects made significantly more errors as the number of subgoals to be generated 
increased, as reflected in an interaction between the subject groups and whether there were 
0 or 1 or more subgoals to be generated. F(3,32) == 3.57. p < .05. Figure 8 also shows 
that the best performance was obtained by subjects with the best Raven test performance, 
F(3,32) = 3.53, p < .05, and that the probability of an error increases with the number of 
subgoals to be generated in working memory, F(2,64) = 77.04, p < .01. This pattern of 
results supports the hypothesis that errors in the Tower of Hanoi puzzle reflect the 
constraints of working memory; consequently, its correlation with the Raven test supports 
the theory iiiat the Raven test also reflects the ability to generate and maintain goals in 
working memory. 



Insert Figure 8 - Tower of Hanoi data 

Because the high correlation between the two tasks accounts for most of the reliable 
variance in the Raven test, it raises the question of whether there is any need to postulate 
abstraction as an additional source of individual differences in the Raven test. But using 
goal-recursion in the Tower of Hanoi puzzle involves some abstraction to recognize each of 
the many config^jirations of sub-pyramids to which the strategy should be applied. Thus, 
the high correlation probably reflects some shared abstraction processes as well as goal 
generation and management. 




0 1 2 + 

NUMBER OF GENERATED ..UBGOALS 



Figure 8. The probability of an error for moves in the Tower of Hanoi puzzle as a 

function of the number of subgoals that are generated to enable that move. The 
curves represent subjects in Experiment 2 sorted according to their Raven test 
scores, from best (33-36 points) to low-median (20-25 points) porformance. 
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The Raven test correlates with other cognitive tests that differ from it in form and 
content, but like the Raven test, appear to require considerable goal managen^ent. One 
example of such a test is an alphanumeric series completion ♦•.est, which requires the subject 
to determine which letter or number should occur next in a series, as in: 

IB3D5G7K?? (The answer is 9 P) 

Such correlations may reflect the fact that both tasks involve considerable goal 
generation and management. A theoretical analysis of the series completion task by 
Kotovsky & Simon (1973: Simon & Kotovsky, 1963: Williams. 1972) indicated that the 
series completion test, like the Raven test, requires correspondence finding, pairwise 
comparison of adjacent corresponding elements, and the induction of rules based on patterns 
of pairwise similarities and differences. The general similarity of the underlying procesfies 
leads to the prediction of correlated performance in the two tasks despite the minimal 
visuo/spatial pattern analysis in the series completion task. This construal of the correlation 
is further supported by the fact that some of the sources of individual differences in the 
series completion task are known and converge with our analysis of individual differences in 
the Raven test. Applying the Simon and Kotovsky (1963, 1973) model to analyze the 
working memory load imposed by different types of series completion problems, it was 
found that problems involving larger working memory loads differentiated between bright 
and average-IQ children more than easier problems: this difference suggests that the ability 
to handle larger memory loads in the series completion task correlates with IQ (Holzman, 
Pellegrino & Glaser, 1983). These correlations, as well as the correlation between the 
Raven test and the Tower of Hanoi puzzie, strongly suggest that a major source of 
individual differences in the Raven test is due to the generation and maintenance of goals 
in working memory. 



PART II: THE SIMULATION MODELS 

In this section, we first describe the FAIRAVEN model which performs comparably to 
the median college student in our sample, already a rather high level of performance 
relative to the population norms. Then, we will describe the changes required to improve 
FAIRAVEN's performance to the highest level attained by our subjects, as instantiated by 
the BETTERAVEN model. 

Overvieiv. The primary goal in developing the simulation models was to specify the 
processes required to solve the Raven problems. In particular, the simulations should make 
explicit what distinguishes easier problems from harder problems, and correspondingly, what 
distinguishes among individuals of different ability levels. The simulations were designed to 
perform in a manner indicated by the performance characteristics observed in Experiment 
la, namely incremental, re-iterative representation and rule induction. 

The general outline of how the model should perform is as follows. The model 
encodes some of the figures in the first row of entries, starting with the first pair of 
entries. The attributes of the corresponding figures are compared, the remaining entry is 
encoded and compared with one of the other entries, and then the pattern of similarities 
and differences that emerges from the pairwise comparisons is recognized as an instance of 
a rule. In problems involving more than one rule, the model must determine which figural 
elements are governed by a common rule. The representation is constructed incrementally 
and the rules are induced one by one. This process continues until a set of rules has been 
induced that is sufficient to account for all the variation among the entries in the top row. 
The second row is processed similarly, and in addition, a mapping is found between the 
rules for the second row and their counterparts in the first row. The rules for the top two 
rows are expressed in a generalized form and applied to the third row to generate the 
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figural elements of the missing entry, and the generated missing entry is selected from the 
response alternatives. 

The programming architecture. Both FAIRAVEN and BE'lTERAVEN are written as 
production systems, a formalism that was first used for psychological modeling by Newell 
and Simon and their colleagues {Newell. 1973: Newell & Simon. 1972). In a production 
system, procedural knowledge is contained in modular units called productions, each of 
which specifies what actions are to be taken when a given set of conditions arises in 
working memory. Those productions whose conditions are met by the current contents of 
working memory are enabled to execute their actions, and they thereby change the contents 
of working memory {by modifying or adding to the contents). The new status of working 
memory then enables another set of productions, and so another cycle of processing starts. 
All production systems share these control principles, although they may differ along many 
other dimensions (see Klahr. Langley & Neches. 1987). 

The particular production system architecture used for these simulations is CAPS (for 
Concurrent. Activation-based Production System) (Just & Carpenter, 1987: Just & Thibadeau. 
1984: Thibadeau. Just & Carpenter, 1982). Even though CAPS was constructed on top of a 
conventional system. 0PS4 (Forgy & McDermott, 1977), it deviates in several ways from 
conventional production systems. One distinguishing property is that on any given cycle, 
CAPS permits all the productions whose conditions are satisfied to be enabled in parallel 
with each other. Thus CAPS has the added capability of parallelism, in addition to the 
inherent seriality of a production system. By contrast, conventional production systems 
enable only one production per cycle, regardless of how many of them have had their 
conditions met. requiring some method for arbitrating among satisfied productions. Another 
distinguishing property of CAPS is that knowledge elements can have varying degrees of 
activation, whereas in conventional systems, elements are either present or absent from 
working memory. Other properties of CAPS, not used in the present applications, are 
described elsewhere {Just & Thibadeau, 1984: Thibadeau, Just & Carpenter, 1982). 

FAIRAVEN 

FAIRAVEN consists of 121 productions which can be roughly divided into three 
categories: perceptual analysis, conceptual analysis and responding. These three categories, 
which respectively account for approximately 48%, 40% and 12% of all the productions, are 
indicated in the block diagram ia Figure 9. The productions that constitute the perceptual 
analyzer simulate some aspects of the visual inspection of the stimulus. Th^se productions 
access information about the visual display from a stimulus description file and bring this 
information into working memory as percepts. These productions also notice some relations 
among percepts. The productions in the conceptual analyzer try to account for the 
variation among the entries in one or more rows by inducing rules that relate the entries. 
The responder uses the induced rules to generate a hypothesis about what the missing 
matrix entry should be and it then determines which of the eight response alternatives best 
fits that hypothesis. The next sections describe each of the three categories in more detail. 
This description is followed by a example of how FAIRAVEN solved the problem shown in 
Figure 2. 



Insert Figure 9 - FAIRAVEN modules 

Perceptual analysis 

FAIRAVEN operates on a stimulus description that consists of a hand-coded, symbolic 
description of each matrix entry. Thus, the visual encoding processes that generate the 
symbolic representation lie outside the scope of the model. This incompleteness does not 
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Figure 9. A block diagram of FAIRAVEN. The perceptual analysis productions, 

conceptual analysis productions, and response generation productions all interact 
through the contents of working memory. The perceptual analysis productions 
accept stimulus descriptions and generate a list of simulated fixations. 



ERIC 



33 



14 



compromise our analysis of individual differences, for three reasons. First, the high 
correlations between the Raven t.^^t and other non-visual tests (such as alphanumeric series 
completion and verbal analogies, shown in Figure la) indicate that visual encoding processes 
are not a major source of individual differences. Second, our protocol studies of the Raven 
test suggest that subjects have no difficulty perceiving and encoding the figures in each 
entry of a problem, such as squares, lines, angles, and so on. Third, the protocols indicate 
that the subjects do have difficulty determining the correspondences among figures and 
their attributes, a process that lies within the scope of the model. 

Stimulus descriptions. The perceptual analysis productions operate on a symbolic 
description of each matrix entry and response alternative. To generate these descriptions, an 
independent group of subjects ^as asked to describe the entries in each problem, one entry 
at a time, without any problem-solving goal. The modal verbal descriptions served as the 
basis for the stimulus descriptions. The typical descriptions were in terms of basic-level 
figures (Rosch. 1975) and their attributes, such as a square* a line, striped, and so on. For 
example, the entry in the upper left of the matrix shown in Figure 2 would be described 
as a concatenation of two figures, a diamond and a line, with the line having the attributes 
of orientation (vertical) and texture (dark). The stimulus description of some figures 
contained an additional level of detail that was accessed if the base-level description was 
insufficient to establish correspondences, as in the case of embedded figures. 

The perceptual analysis is done by three subgroups of productions that (1) encode the 
information about the figures, (2) determine the correspondences and (3) compare the figures 
in adjacent entries to obtain a pattern of pairwise similarities and differences. Each 
subgroup is described in turn. 

Encoding productions. These productions, the only access path to the stimulus 
information, transfer some or all of the information from the description file into working 
memory when such information is requested. If the entries in a given problem contain 
figures with several attributes, then FAIRAVEN will go through multiple cycles of 
perceptual analysis of the entries in a row. until all the attributes have been analyzed. 
This behavior of the model was intended to express the incremental processing and re- 
iterative scanning of the entries that was evident in the human eye fixation patterns. 
Some of the simulated inspections of the stimulus, like the initial inspection of an entry, 
are data-driven. If an entry's position in the matrix is specified, one of the encoding 
productions returns the names of each figure in that entry and the number of figures, but 
not any attribute information. Other inspections can be driven by a specific conceptual 
goal, such as the need to determine attributes of a particular figure. If an entry's position 
and the name of a figure are specified* one of the encoding productions returns an attribute 
of the figure and» if requested* its value. These encodint^ productions* which are more 
conceptually driven, are evoked after hypotheses are formulated in the course of inducing 
and verifying rules. 

Finding correspondences between figures. In most problems, because more than one 
rule is operating, it is necessary to conceptually group the figures in a row that are 
operated on by each rule. The main heuristic procedure that subjects seem to use is to 
hypothesize that figures having the same name (e.g. line) should be grouped together. 
Similarly, FAIRAVEN uses a matching' names heuristic, which hypothesizes that figures 
having the same name correspond to each other. A second heuristic rule used by 
FAIRAVEN is the niatching*leftovers heuristic, which hypothesizes that if all but one of the 
figures (or attributes) in two adjacent entries have been grouped, then those leftover figures 
(or attributes) correspond to each other. For example, for the problem depicted in Figure 
2, the matching-names heuristic hypothesizes the correspondence among the three lines and 
the matching-leftovers heuristic hypothesizes correspondence among the geometric figures 
that are leftover in each entry. 
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FAI RAVEN also tries to establish correspondences between the figures in different 
rows by expressing how the rules from a previous row account for the variation in the new 
row. usually by generalizing the rule. 

Pairwise comparison. The pairwise comparison productions perform the fundamental 
perceptual comparisons between figures or attributes that are hypothesized to correspond to 
each other, and thus provide the data-base for the conceptual processing. These productions 
determine whether the elements are the same or different with respect to one of their 
attributes. For example, consider a row of three entries consisting of successive sets of 
circles: o oo ooo. By comparing the circle in the first entry with the two circles in the 
second entry, these productions would establish that they differ in the attribute of 
numerosity such that the second entry has one more circle. These productions would then 
determine that this difference also characterizes the relation between the second and third 
entries. Both of these differences would be noted in working memory, and would serve as 
the input to a production that hypothesizes a systematic variation in the numerosity of the 
circles across the three columns. The human counterpart of the pairwise comparison 
processes may be responsible for the one or more pairs of gazes between two related 
entries in the eye fixation protocols. 

Conceptual analysis 

The conceptual-analysis productions induce the rules that account for the variation 
among the figures and attributes in each of the first two rows. For example, if the 
numerosity of an element is one in column 1, two in column 2, and three in column 3, 
then a rule-induction production would hypothesize that the variation in numerosity is 
governed by a rule that says "add one as you progress rightward from column to column". 
The types of rules FAIRAVEN knows are: 

- Constant in a row 

- Quantitative pairwise progression 

- Distribution-of-three-values 

- Figure addition or subtraction 

Note that this list of rules does not include distribution-of-two-va!ues, even though it 
is one of the rules governing the variation in some of the problems. The reason for 
omitting this rule is that problems containing this rule could not be solved with 
FAIRAVEN's limited correspondence-finding ability. Also> problems containing this rule were 
often unsolved by the median subjects whom FAIRAVEN was intended to simulate.^ 

The main information on which the rule-induction productions operate are the patterns 
of pairwise similarities and differences. When a particular pattern of variation in the 
entries has been encoded in working memory, it directly evokes the appropriate rule-inducing 
production. Some of the productions in this module induce a rule to account for just one 
row et a time, whereas others induce a generalized form of the rule by combining the rules 
that apply to corresponding figures in both the first and the second rows. The 
generalization is made by expressing the rules in terms of variables rather than using the 
actual values encountered in the first two rows. The more general form of the rules induced 
by the model are intended to be counterparts of the human subjects' verbal statements of 
the rules. In a later section, the simulation's and human subjects' statements of rules will 
be compared with respect to their content and the time in the trial at which they occur. 
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The perceptual analysis and the conceptual analysis are applied to the second row 
much as to the first row, except that the processing of the second row includes one 
additional step, namely establishing correspondences between the figures in the first and 
second rows. The perceptual analysis of the first two entries in the third row is similar to 
the analysis of the second row, including encoding, finding correspondences, and doing 
pairwise comparisons to determine which figures or values vary and which are constant in 
the first two entries. When this processing has been done, the response-generation 
productions take over. 

Response generation and selection 

The productions in this module use the hypothesized rules and the information in the 
first two columns of the third row to generate the missing entry in the third column. The 
genoral form of the rule that applies to the first two rows must be instantiated in terms of 
the specific values encountered in the first two entries in the third row. In problems 
containing more than one rule, the inter-row correspondence between figures indicates which 
rules to associate with which figures. Then the instantiated rule lor rules) is applied to 
generate the missing entry. FAIRAVEN searches through the response alternatives for one 
that adequately matches the generated missing entry. 

FAlRAVEN's strategy of generating the figures and attributes of the missing entry 
and then finding it among the alternatives closely corresponds to what the higher-scoring 
subjects did. The lower-scoring subjects sometimes scanned the response alternatives before 
inducing the rules, particularly in the case of the more difficult problems. Other 
researchers have also found that lower-scoring suHects are more likely to use response 
elimination strategies for geometric analogy problems, whereas higher-scoring subjects are 
more likely to determine the properties of the desired response before examining the 
response alternatives (Bethell-Fox, Lohman & Snow, 1984: Dillcn & Stevenson-Hicks, 1981). 

An example of FAIRAVEN' s performance 

FAIRAVEN's processes nan be illustrated by describing how the model solves the 
problem depicted in Figure 2. FAIRAVEN starts by examining the top row. The variation 
among the three entries in a row is found by examining the pairwise similarities and 
differences between the figures and attributes found in adjacent columns. The first pairwise 
comparison is between the entries in the first and second columns of the top row. The 
encoding productions determine that the first encry contains a diamond and line and the 
second entry contains a square and line. The productions that find correspondences use the 
matching-names heuristic to postulate a correspondence between the lines that occur in the 
two entries. Once a correspondence is found between the lines, the matching-leftovers 
heuristic is used to postulate a second correspondence between the diamond and the square. 
FAIRAVEN then compares the entries in the second and third columns. The lines in the 
second and third columns are postulated to correspond to each other, and the square is 
postulated to correspond to the triangle. The pattern of variation among the lines evokes 
the induction of a rule requiring that each entry in a row contain a line. Note that this is 
not the final form of the rule. The pattern of variation among the other figures in 
correspondence, namely the diamond, square and triangle, evokes the induction of a 
distribution-of-three-values rule, such that each row contains one each of a diamond, square 
and triangle in its three entries. 

After these two rules have been induced, there is a second iteration of inspecting the 
entries in the first row. In the second iteration, the variation in the texture of the lines is 
noted and this evokes the rule that each set of lines in a row has a texture that is either 
black, striped or clear. On this second and subsequent iterations, one attribute (and its 
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value) per figure is perceived. Thus, the total number of iterations on a row depends on 
the maximum number of attributes possessed by any of the figures. As the variation in 
each additional aitribute is discovered, one or more additional rules are induced to account 
for the variation. Thus, the perceptual and conceptual analyses are temporally interwoven. 

The order in which the various attributes are processed is determined by the order in 
which they are encoded, which in turn is determined by their order in the stimulus 
description file, which in turn was guided by their order of mention by the subjects who 
only described the entries. So on the next iteration. FAIRAVEN encodes the orientation of 
the line. The value is vertical for each line, so FAIRAVEN hypothesizes a constant in a 
row rule. The final Inull) iteration reveals no further percepts to be accounted for. so 
FAIRAVEN proceeds to the second row. Note the similarity of FAIRAVEN's processing to 
the protocol of the human subject shown in Table 2, reflecting the incremental re-iterative 
nature of the processing. For both the model and the subject, there are multiple visual 
scans of the first row. and in both cases, there is a considerable time interval between the 
induction of the different rules. 

The processing of the second row closely resembles that of the first row. in that the 
lines and geometric shapes are encoded, and the correspondence among the lines and among 
the shapes is noticed. The rules governing the geometric shapes, line textures and line 
orientation are induced. In addition, the correspondences between the geometric shapes in 
the first two rows is noticed, as is the correspondence between the lines, and a mapping is 
made between the rul^s for the two rows. It is noted that the rules governing line 
orientation are different in the two rows (constant vertical orientation in the first row, 
horizontal in the second row). Note that the subject's eye fixation protocol in Table 2 shows 
a scan of Row 2 interspersed with scattered inspections of Row 1, which may reflect the 
mappings from one row to another. 

FAIRAVEN proceeds to Row 3 having formulated a generalized form of the rules, 
namely distribution of the three geometric shapes, distribution of the three line textures, 
and a constant orientation of lines in all the entries in a row. The inspection of the 
geometric figures in the first two columns of Row 3 indicates which one of the triplet of 
shapes is missing (the square). Inspection of the line textures indicates which is missing 
*the black). Finally, the orientation of the lines in the first two columns indicates that the 
constant value of line orientation will be slanted from upper left to lower right. 

The application of the three rules to the knowledge about the first two entries of Row 
3 is sufficient to correctly generate the missing entry: a square and a line. Only the three 
response alternatives that contain a square and line (#2, #5, and #8) are given any further 
consideration. The generated missing entry contains a black slanted (from upper left to 
lower right) line which matches alternative #5, which is chosen as the answer. 

FAIRAVEN solved 23 of the 34 problems it was given, the same as the median score 
of the 12 Carnegie Mellon students in Experiment la. Like the median subjects it is 
intended to simulate. FAIRAVEN solved the easier problems and could not solve most of 
the harder problems. The point-biserial correlation between the error rate for each problem 
in Experiment la and a dichotomous coding of FAIRAVEN*s success or failure on the 
problem was r(32) — .67. g <.01. indicating that the model was more likely to succeed on 
the same problems that were solved by more of the human subjects. FAIRAVEN's 
performance on each problem is given in Appendix A. However, we will postpone a detailed 
analysis of the errors until Part III. 

For the present purposes, the important point is that FAIRAVEN performed credibly, 
but at the same time, it had several limitations that prevented it from solving more 
problems, ^irst. FAIRAVEN had no ability to induce rules that do' not contain 
corresponu..ig arguments (figures or attributes) in all three columns. Consequently, 
FAIRAVEN could not solve the problems involving the distribution-of-two-values rule. 
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Second, FAIRAVEN had difficulty in problems in which the correspondence among figural 
elements is not found by either the matching-names heuristic nor the matching-leftovers 
heuristic, such as correspondences based on location or texture. In these cases, the initially 
hypothesized correspondences based on figure names did not result in a correct rule, but 
FAIRAVEN had no way to backtrack. Third. FAIRAVEN had difficulty when too many 
high-level goals arose at the same time, and FAIRAVEN tried to pursue them concurrently. 
This situation occurred in problems with three or four rule tokens, when the perceptual 
evidence to support the multiple rules emerged simultaneously. FAIRAVEN tried to 
confirm all the rules in parallel, as CAPS permits, but the resulting bookkeeping load was 
unmanageable. In spite of these limitations, it is important to note that this program was 
able to perform on an intelligence test as well as some college students, using strategies 
similar to theirs, and exhibiting similar behavior to theirs. 

BETTERAVEN 

The higher-scoring subjects in our experiments performed better than FAIRAVEN: 
what psychological processes distinguish them from the median-scoring subjects and from 
FAIRAVEN? The BETTERAVEN model is our best current answer. The development of 
BETTERAVEN used FAIRAVEN as a starting point and did as little reorganization and 
addition as possible. The resulting model. BETTERAVEN. exercises more direct strategic 
control over its processes. Also. BETTERAVEN can induce more abstract rules based on 
more abstract correspondences (permitting null arguments). 

BETTERAVEN's improved strategic control necessitated the addition of a fourth 
category of productions, as shown in the block diagram in Figure 10. The new category is 
a goal monitor that sets strategic and tactical goals, monitors progress towards them, and 
adjusts the goals if necessary. In addition, the control structure of BETTERAVEN, as 
governed by the goal monitor, is somewhat changed. In BEITERAVEN. only one category 
of productions can be operating at a given time. BETTERAVEN also had some changes 
made to the perceptual and conceptual analyzers. The correspondence-finding processes are 
more sophisticated, allowing BETTERAVEN to handle rules applying to null arguments, 
such as a distribution-of-two-values rule. The conceptual analyzer also has more rules in its 
reperf^CJiC nnd uses the goal monitor to control the order in which rules are induced. The 
responder is effectively ur ' mged from FAIRAVEN. 



Insert Figure 10 - BETT E RAVEN modules 



The goal monitor 

A module containing 15 productions sets main goals and subgoals for the model The 
main purposes of the goal monitor are to ensure that higher-level processes (namely, rule 
induction) occur serially and not concurrently, to provide an effective serial order for 
inducing rules (i.e. conflict resolution), to maintain an accounting of the model's progress 
towards its goals, and to appropriately modify its path to the solution when a difficulty is 
encountered. The goal monitor has a knowledge base that contains the goal structure for 
this task. For example, when starting to work a new problem, the goal monitor might set 
the following goals and subgoals, and keep a record of their satisfaction or non-satisfaction: 
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Figure 10. A block diagram of BETTERAVEN. The distinction from FAIRAVEN visible 
from the block diagram is the inclusion of a goal monitor that generates and keeps 
track of progress in a goal tree. 
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Top Goal: Solve problem 

Subgoal 1: Find all rules in top row 

Subgoal 2: Do a first scan of top row 

Subgoal 3: Compare adjacent entries 

Subgoal 4: Find what aspects are SAME, DIFFERENT, 

or NO-RELATION 

To attain these goals, each row is re-iteratively scanned and rules are induced to account 
for the variation, with the number of iterations increasing with the complexity of the 
entries. This behavior of the model is motivated by the re-iterative nature of the eye 
fixation data and by the concurrent verbal protocols. 

The management of the goal stack is under the exclusive control of the goal monitor. 
When it is appropriate to change the model's level of analysis, the goal monitor changes 
the current goal to either a parent goal or a subgoal. The consequence of setting a 
particular goal is to evoke some subset (module) of productions, such as the perceptual 
analysis module or the response generation module. The monitor keeps a record of which 
goals have been set. and what the current goal is. This knowledge makes it possible to 
backtrack where necessary. Four back-tracking productions take back specific hypothesized 
rules that have been proven unfruitful, as well as taking back hypotheses about what the 
relevant attribute is and which elements correspond to each other. It is important to note 
that both BETTERAVEN and FAIRAVEN have goal management capability, but that 
BETTERAVEN's capability was enhanced as described. 

Changes in the perceptual analyzer 

The major change to the perceptual analyzer is that the heuristics for finding 
correspondences among figures are more genera^ overcoming several difficulties encountered 
by FAIRAVEN's heuristics. One type of difficulty arose when the number of figures per 
entry was not the same in each of the entries in a row. This difficulty occurs in problems 
containing a distribution-of-two-values rule, as well as figure addition and subtraction, in 
which a figure in one entr>' has no counterpart whatsoever in another entry. Since 
FAIRAVEN insisted on assigning a counterpart to every figure in every entry, it would err 
in such problems (as did many of the lower-scoring subjects). To deal with such rules, 
BETTERAVEN's new correspondence-finding productions in the perceptual analyzer assign a 
leftover element in one of a pair of entries to a null counterpart in another entry. 

A second type of difficulty arose when the correspondence was based on an attribute 
other than the figures' name (such as two different figures having the same texture or 
position). When the matching-names (or any other) heuristic fails to lead to a satisfactory 
rule. BETTERAVEN's goal monitor can backtrack, postulate a correspondence based on an 
alternative attribute, and proceed thenceforth. By contrast, FAIRAVEN kept no record of 
choosing a correspondence heuristic and had no way of backing up to it if the choice 
turned out incorrect. 
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Rule induction 

BETTERAVEN's rule-induction was improved over FAIRAVEN's by virtue of serial 
mle induction (imposed by the goai monitor!, the presence of a new rule (discribucion-of-two- 
values), and more general rules for figure addition and possible (enabled by the improved 
correspondence finding). Furthermore, the goal monitor permits BETTERAVEN to 
backtrack when a postulated rule fails to account for the variation. 

Enforcing scriality in rule induction. In FAIRAVEN's solving of the easier problems, 
it did no harm to hypothesize all the rules concurrently, because the rules were different 
enough and few enough that the processing consequences were manageable. However, in 
problems containing more difficult rules and a larger number of rules, the concurrent 
postulation of several rules led to several Jifficulties. First, the*-? were competing attempts 
to simultaneously account tor the same variation with two or more rules, which made the 
bookkeeping requirements unacceptably large in FAI RAVEN. Second, in problems with 
many figures, there w^s so much variation in figures that some of the more arcane 
variation did not become evident unless the more mundane variation was first accounted for. 
or. in some sense, removed. For example, in one of the problems containing two rules, it 
^s much easier for the program (and human subjects) to induce a distribution-of-two-values 
rule after the other figures have been accounted for with a figure addition rule. Finally, 
tb man verbal protocols strongly suggested that the subjects attempted to fit only one 
ruL ^ c a ti? le to the figures. Although CAPS permits parallelism at all levels. attemp*^s ?w 
parallelism at P'AIRAVEN's higher conceptual levels (namely rule induction) wreaked havoc, 
while parallelism at the lower perceptual levels caused no difficulty. 

To improve BETTERAVEN*s performance. BETTERAVEN was per . Sd to induce 
only one rule at a time, and furthermore, productions from the diffei-e^l dules were not 
permitted to fire concurrently. So the perceptual and conceptual modules BETTERAVEN 
differ from each othe. in two respects: the time at which they dominate (early in the trial 
for the perceptual module, versus late for the conceptual) and whether they can tolerate 
concurrence (concurrence for the perceptual, seriality for the conceptual). 

The seriality of rule-induction and consequent processing in BETTERAVEN is enforced 
by conflict-resolution rules chat arbitrate between any of the rule types that are 
hypothesized concurrently. The priorities prevent the later rules from firing until the earlier 
rules have had a chance to try to account for the variation. The priority among the rule 
types in the model is the following. 

1. Constant in a row 

2. Quantitative pairwise progression 

3. Distribution-of-three-values 

4. Figure addition or subtra^ on 

5 Distribut *on-of-t wo-values 

BETTE raven's design required that there be an ordering, so that only one rule 
would be induced at a time, as it was in the human performance. However, 
BETTERAVEN's design did not dictate what that ordering should be. Three partial 
orderings were derived, based largely on several empirical results and logical considerations. 
First, the priority accorded to the constant in a row rule is based on the fact that it 
accounts for the most straightforward, null variation, and is so relatively easy that it 
sometimes goes unmentioned in the human protocols. However, recall that the data do not 



ERIC 



41 



21 



eliminati the possibility that this rule can be induced in parallel with others, so the 
orderinp; of this rule type should not be ovei interpreted. Second, figure addition/subtraction 
h^^s priority over distribution-of-two-values because it accounts for more figures in a row 
(each of the addends plus the sum, for a total of four figural components), while 
distribution-of-two-values accounts for only two figural components. Finally, quantitative 
pairwise progression is given priority over the distribu* )n-of-two-values by human subjects, 
as we learned from a study briefly described below. 

Jan Maarten Schraagen performed a study in our laboratory that compared the 
relative time of mention of quantitative nairwise progression rules versus distribution-of-two- 
values rules. To control for the possibility that the order in which rules are induced 
r'epends primarily on the relative salience of the figural components to which they apply, 
cwo isomorphs of each problem were constructed, differing in which rule applied to which 
figura! elements. For example, in one isomorph a quantitative pairwise progression rule 
might apply to the numerosity of lines, and a distribution-of-two-values rule might apply to 
some tr angles. In the other isomorph, the quantitative pairwise progression rule would 
apply to the triangles, whereas the distribution-of-two-values rule would apply to the 
numerosity of lines. There were 86 observations (interpretable verbal protocols in correctly 
solved problems), and in 83% of these observations, the pairwise quantitative rule was 
induced before the distribution-of-two-values rule. This empirical finding confirms that at 
least part of the order in which the simulation induces the rules corresponds to the order 
in which people do. 



PART III: COMPARING HUMAN PERFORMANCE TO THE THEORY 

In this section, we compare the human performance to the simulation models for 
three types of performance measures: (1) error patterns, (2) the content of the rules that 
were induced, and (3) on-line measures, specifically, patterns of eye fixations and verbal 
reports. 

i. Error patterns 

As described earlier, FAIRAVEN solved 23 of the 34 problems, which is the median 
score of the 12 subjects in Experiment la. (Recall that only 32 of the problems were 
classifiable within our taxonomy). BETTERAVEN lived up to its name, solving all but the 
two unclassifiable problems, similar to the performance of the best subject in Experiment 
la. Thus, the pt^rformance of FAIRAVEN and BETTERAVEN resembles the median and 
best performance respectively. The patterns of human errors will be analyzed in more detail 
below, to determine what characteristics of ♦ e problems are associated with the variation in 
error rates. Following this analysis, a more detailed comparison will be made between the 
human error patterns and those of the simulation models. 

Interpretable patterns of errors emerge when the problems are grouped according to 
the properties in our taxonomy. The error rates of problems grouped this way are given in 
Tall-* 3. The rows of the table correspond to different problem types that are distinguished 
by the type of rule involved, the number of different types of rules, and the total number 
of rules (of any type) and whether some of the problems in that group involved difficult 
correspondence-finding. The error patterns in Experiments la and lb are generally 
consistent with each other, even though the two experiments are not exact replications, 
because only half of th9 problems in Experiment lb are from the Raven Advanced 
Progressive Matrices test and half are similar problems from the Standard Progressive 
Matrices test. 
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Insert Table 3 - problem char, table 

In general, the error rates increase down the column as the number of rules in a 
problem increases. The lowest error rate. 6% in Experiment la and 9% in Experiment lb, 
is associated with problems containing only a pairwise quantitative progression rule, 
indicating how easy this rule type was for our sample of subjects. Problems with pairwise 
quantitative progression rules may oe relatively easy because, unlike all the other rules, this 
rule can be inferred from a pairwise comparison of only two figures. Repeated pairwise 
fixations between adjacent entries occurred frequently, even for lower-scoring subjects. 
Pairwise comparison may be a basic bvilding block of cognition, and consequently, it was 
made a basic architectural feature of the simulations. 

The next lowest error rate is associated with problems that contain a single token of 
a figure addition or subtraction rule, or a distribution-of-three-values rule, shown in the 
second and third rows of Table 3. The rules relating the three entries in these problem 
types require that the subject consider all three arguments simultaneously, rather than only 
generalize one of the pairwise relations. To induce these types of rules, the subject must 
reason at a higher level of abstraction than that needed for pairwise similarities and 
differences. The verbal protocols in t.iese problems indicated that the subjects who were 
having difficulty often persisted in searching for a single pairwise relation that accounted for 
the variation among all three entries. 

The number of rule tokens appears to be a powerful determinant of error rate. The 
effect is seen clearly in the contrast between the relatively low error rate for problems with 
only one token (in the first three rows), averaging 16%. versus the error rate for problems 
with three or four tokens (in the last three rows), averaging 59%. One reaf-on why it is 
harder to induce multiple rule tokens is that it requires a greater number of iterations of 
rule-induction to account for all of the variation. Moreover, keeping track of the variation 
associated with a first rule while inducing the second rule (or third rule) imposes an 
additional load on working memory. Approximately 50% of the errors on problems with 
multiple rules may arise from an incomplete analysis of the variation, as indicated in the 
ongoing verbal reports by a failure to mention at least one attribute or rule.^ Such 
incompleteness may be partially attributed to failing to maintain the goal structures in 
working memory that keep track of wiiat variation is accounted for and what variation 
remains unexplained. Another process made more difficult by multiple rules is 
correspondence finding. As the number of rules increases, so does the number of figural 
elements or the number of attributes that vary across a row. This, in turn, increases the 
difficulty of conceptually grouping the elements that are governed by each rule token. 

The difficulties of correspondence finding were particularly apparent for problems with 
multiple possible correspondences and raisleading cues to correspondences (like the problem 
in Figure 5 described earlier). An analysis of the subjects* verbal reports in all the 
problems identified as having misleading or ambiguous correspondence cues indicates that 
the correspondence finding process was a source of significant difficulty. The reports 
accompanying 74% of the errors in these problems indicated that the subject had either 
postulated incorrect correspondences among figural elements, or was not able to determine 
which elements corresponded. Sometimes subjects reflected this latter difficulty by saying 
that they couldn't see a pattern, even after extensive visual search or after having initially 
postulated and retracted various incorrect correspondences and rules. 

In contrast to the types of rules whose impact is evaluated in Table 3. the presence 
of a constant in a row rule had a small or negligible impact on performance. The mean 
error rate and response time for six problems containing the constant rule (involving 
distribution-of-three-values or figure addition/subtraction) was 30% and 38.9 seconds 
respectively, which is similar to those measures for eight comparable problems that did not 
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Table 3 

Error rate {X) for different problem types 

Number Number Experiment 
of Rule of Rule 



Types 


Tokens 


Rule Type 


la 

(n=12) 


lb 

(n=22) 


1 


1 


Pairwise oroffression 


6 


9 


1 

JL 


1 


Addi t ion/Sub tract ion 


17 


13 


1 


1 


Distribution of 3 values 


29 


25 


1 


2 


Distribution of 3 values 


29 


21 


2 


2 


Two different rules-^'^ 


48 


54 


1 


3,4 


Distribution of 2 values' 


56 


42 


2 


4 


Distribution of 2, 3 values' 


59 


54 


i 


3 


2 

Di:-tribotion of 3 values' 


66 


77 



^This category is a miscellany of problems that contain two different 
rule types, such as addition and distribution-of-three-values, or 
quantitative pairvise progression and distribution-of- three-values, 

^Corresponding elements are ambiguous or misleading for some or all of the 
problems in these categories. 
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involve a constant rule (28%. 41.9 seconds). One possible reason for the minimal impact of 
a constant in a row rule is that unlike any other type of rule, it requires storing only one 
value ;i.e. the constant) for an attribute. 

The analysis of the human error patterns above can be compared to those of the 
simulation models. I'o make this comparison, the problems shown in Table 3 were grouped 
further, dividing the table into the first four rows consisting of the easier problems and the 
last four rows consisting of the harder problems, namely, those involving multiple rules, 
more abstract rules, and/or misleading corresp^^ndences. The subjects to v/hom FAIRAVEN 
should be most similar are those with scores close to the median. The six subjects in 
Experiment la whose total score was within 10% of the median had a 17% error rate on 
the easier problems, and a 70% error rate on the harder problems. In comparison, 
FAIRAVEN has c* 0% error rate on the easier problems, and a 90% error rate on the 
harder problems. Thus, the FAIRAVEN model has a similar error profile to the subjects it 
A'as intended to simulate, appropriately matching the difficulty these subjects have with 
problems containing multiple rule tokens and difficult correspondences. The BETTERAVEN 
model and the subjects to whom it should be similar (namely, the best subjects) can solve 
almost all of the problems, so they have similar (essentially null) error profiles. Appendix A 
indicates the performance on each problem of Experiment la by the human subjects and by 
the two simulation models. 

Modifications of BETTERAVEN 

In addition to comparing FAIRAVEN and BETTERAVEN to the human performance, 
it is possible to degrade various abilities of BETTERAVEN and examine the resulting 
changes in performance. A demonstration that degraded versions of BETTERAVEN 
account for intermediate levels of performance between the levels of FAIRAVEN and 
BETTERAVEN can provide converging support for the present analysis of individual 
differences. Graceful degradation of BETTERAVEN also provides a sensitivity analysis that 
can indicate which of the new features of BETTERAVEN contributed to its improved 
performance. "Cognitive lesions" were made in BETTERAVEN to assess how its added 
features contributed to its superiority over FAIRAVEN. The two features of 
BETTERAVEN that were modified pertained to (1) abstraction, in particular, the ability to 
induce the distribution-of-two-values rule and (2) goal management. 

Lesioning abstraction ability. One source of BETTERAVEN's advantage over 
FAIRAVEN is its ability to form abstract correspondences (involving null argument^;) and 
hence induce the distribution-of-two- values rule. BETTERAVEN used this rule in nine of 
the eleven most difficult problems; these were all problems that FAIRAVEN did not solve 
and BETTERAVEN did. Because the abstraction ability was firmly enmeshed with 
BETTERAVEN's processing, it was not possible to lesion it without disabling 
BETTERAVEN entirely. However, it was possible to lesion (eliminate) the distdbution-of- 
two-values rule from BETTERAVEN's repertoire, in a model called 

BETTEjRAVEN'Withoiit'distribution'Of'2'nile. Not surprisingly, this modified model did not 
correctly solve the nine problems in which the rule had been used by BETTERAVEN (as 
shown in Appendix A), degrading its performa co the level of FAIRAVEN. However, it 
would be incorrect to conclude that this rule is the only property on which 
BETTERAVEN's superiority over FAIRAVEN is based, for two reasons. First, the ability 
to correctly induce the distribution-of-two-values rule depends on BETTERAVEN's ability to 
induce abstract correspondences, including the absence of an element. Second, this rule was 
woked in pioblems involving multiple rules, and consequently, problems that taxed 
BETTERAVEN's goal management. As the next section demonstrates, the ability to 
manage goals also played a central role in BETTERAVEN's improvement over FAIRAVEN. 

Lesioning goal management. To examine how BETTERAVEN's performance is 
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influenced by goal management capabilities, impaired versions of BETTERAVEN were 
created in which goa) management competed with the ability to maintain and apply rules, 
to the extent that goal information was displaced from working memory. For example, in 
one of the lesioned models, if the problem required more than three rules to be induced 
and applied to the last row. then the extra rules (beyond three) displaced some of the 
remaining subgoals stored in the goal tree, and resulted in an erroneous response, in which 
only three rules were used to generate the response. This behavior corresponds to the 
human errors which are based on an incomplete set of rules. The modified versions of 
BETTERAVEN. v/hich could maintain and apply either three, four or five rules before 
displacing goals from working memory, are called BETTERAVEN *3*rules, 
BETTERAVEN-4'rules and BETTERAVEN -S-rules, respectively. The performance of these 
modified versions is shown in Appendix A. along with the performance of the unmodified 
BETTERAVEN. In general, as the goal management information in BETTERA^'EN is 
increasingly displaced by information about the rules, its ability to solve problems was 
degraded. BETTERAVEN-3-rules solved 11 fewer problems than the unmodified 
BETTERAVEN. BETTERAVEN-4-rules solved 8 fewer problems and BETTERAVEN-5-rules 
solved 4 fewer problems than the unmodified BETTERAVEN. The failures of the modified 
versions occurred primarily on problems with more rule tokens, namely the problems that 
require more goal management. 

The cognitive lesioning experiments produced intermediate levels of performance, 
accounting for the continuum of performance that lies between FAIRAVEN and 
BETTERAVEN. Moreover, the relation between the particular lesions and the resulting 
patterns of errors confirms the importance of abstraction and goal management in 
performing the Raven test. 

2. The rules that were induced 

The simulations can be evaluated in terms of the specific rules that they induce, in 
comparison to the rules of the subjects in Experiment lb. who were instructed to try to 
solve each problem and then explicitly describe the rules they induced. The main 
comparison is based on rule descriptions provided by a plurality of the 12 (out of 22) 
subjects modeled by FAIRAVEN and BETTERAVEN, namely those 12 who attained at 
least the median score. Across the 28 problems in Experiment lb, there was a total of 59 
attiibutes for which at least one subject gave a rule that was classifiable by our taxonomy.® 

The main finding is that for 52 of the 59 attributes, BETTERAVEN induced the 
same rule as the plurality of the subjects. Four of the seven disagreements arose in cases 
where BETTERAVEN induced a distribution-of-two-values rule whereas the subjects induced 
figure addition or subtraction.^ The fit for FAIRAVEN was similar, except for problems 
involving the distribution-of-t wo- values rule, which FAIRAVEN did not solve. Thus, the 
simulation models match the subjects not only in which problems they solve, but also in 
the rules that they induce. 

Alternative rules to account for the same variation. In problems in which alternative 
rules can account for the same variation, there is a sugge.stion that higher-scoring subjects 
induced different rules than lower-scoring subjects. Consider again the earlier example of 
how two different rules might describe a series of arrows pointing to 12 o'clock. 4 o'clock 
and 8 o'clock; the variation can be described as a distribution-of-three-values or as a 
pairwise quantitative progression of an arrow's orientation, namely, a clockwise rotation of 
120 degrees beginning at 12 o'clock. Although both rules are sufficient to solve the 
problem, the transformational rule is preferable because it is typically more compact and 
generative: knowing the transformation and one of the values of an attribute is sufficient to 
generate the other two values (in the case of a quantitative progression rule) and the 
transformational rule usually applies more directly to successive rows. The verbal protocols 
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were examined to determine whether transformational rules were more closely associated 
with correct solutions than distributional rules in the particular problems in which they were 
induced, and whether more generally, higher-scoring subjects were more likely to use 
transformational rules than distributional rules, compared to lower-scoring subjects. 

Twenty-one of the 34 problems in Experiment la (Set I-#12, II-L 3, 8, 10, 13, 16, 
17, 22, 23, 26. 27. 29, 31-36) evoked a mixture of transformational and distributional rules 
from different subjects. Each protocol for these problems was categorized as describing a 
transformation or distribution of values, including partial descriptions. A description that had 
both transformational and distributional characteristics was counted as transformational. 
There were 156 transformational responses, 90 distributional responses, and 6 that could not 
be classified in Experiment xa. For the problems in question, the transformational responses 
were associated with considerably better performance (error ra^e of 31%) than were the 
distributional responses (53% error rate). In a separate analysis limited to only those 
problems 'n which a correct final response was made, 71% of the problems were 
accompanied by a transformational rule, while 29% were accompanied only by a 
distributional rule. Transformational descriptions were not only associated with success in 
the problem in which they occurred, they were also associated with subjects who did well in 
the test as a whole. Higher-scoring subjects were more likely to give transformational lules 
and less likely to give distributional rules. The ratio of transformational descriptions to 
distributional was 3.6:1 for the highest scoring subjects, but only 1:1 for the lowest scoring 
subjects. These results associate transformational rules with better performance. 

The rule-ordering in BETTERAVEN (e.g. giving precedence of the pairwise quantitative 
progression rule over the distribution-of-three-values) provides a tentative account for the 
finding that a transformational rule was strongly associated with better performance It is 
possible that only the higher-scoring subjects have systematic preferences for some rule 
types over others, as BETTERAVEN does. By contrast, the choice among alternative rules 
may be random or in a different order for the lower-scoring subjects. Thus, the differences 
in preferences among alternative rules between the higher and lower-scoring subjects can be 
accommodated by an existing mechanism in BETTERAVEN. 

5. On-line measures 

The preliminary description of the results in Experiment la indicated that what was 
common to most of the problems and most of the subjects was the incremental problem- 
solving. The incremental nature of the processing was evident in both the verbal reports 
and eye fixations. In problems containing more than one rule, the rules are described one 
at a time, with substantial intervals between rules. Also, the induction of each rule 
consists of many small steps, reflected in the pairwise comparison of related entries. We 
now examine the incremental processing in the human performance in more detail in light 
of the theoretical models, and compare the human performance with the simulations' 
performance. The analyses focus on the effects of the number of rules in a problem on 
the number and timing of the re-iterations a behavior. To eliminate the effects of 
differences among types of rules, the analyses are limited to only those problems that 
contained one. two or three tokens of a distribution-of-three-values rule, and no other types 
of rules. 

Inducing one r^de at a time: verbal statements. The first way in which the rule 
induction is incremental is that in problems with multiple rules, only one rule is described 
at a time. The subjects appear to develop a description of one of the attributes in a row 
of entries, formulate it as a rule, verify whether it fits, and then go on to consider other 
unaccounted for variation. This psychological process is a little like a stepwise regression, 
accounting for the variance with one rule, then returning to account for the remaining 
variance with another rule, and so on. 
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Insert Figure 11 • time of statements of rules 

An analysis of the times at which the subjects report the rules in their verbal 
protocols strongly supports the interpretation that the rules are induced one at a time. In 
problems involving multiple rules, subjects generally stated each rule separately, with an 
approximately equal time interval separating the statements of the different rules. In the 
scoring of the verbal protocols, if a subject stated only the value of the attribute specified 
by the correct rule {e.g. "need a horizontal line") without stating the rule itself, this was 
counted as a statement of the rule. The descriptive statistic plotted in Figure 11 indicates 
the elapsed time from the beginning of the trial until the statement of the first rule, the 
second rule, and if there was one, the third rule. The verbal reports show a clear temporal 
separation between the statements of successive rules. The interval is much longer tlian 
the time needed to just verbalize the rules and seems most likely to reflect the fact thaf: 
subjects induce the rules one at a time. The statement of a rule may lag behind the 
induction processes, but the long time between the rule statements strongly suggests that 
induction processes are serially executed. The time from the beginning of the trial until 
the first rule was stated was approximately 10 seconds for the five problems that had tv/o 
rules per row: it then took another 10 seconds, on average, until the subject stated the 
second rule. Thus, it took an approximately similar amount of time to induce each of the 
rules. For the two more difficult problems, those involving three rules, the average time 
between each statement was close to 24 seconds. The fact that the inter-statement times 
were longer for the latter group of problems indicates that a rule takes longer to induce if 
there is additional variation among the entries (variation that eventually was accounted for 
by the additional rules). Several of the processes would be made more difficult by the 
additional variation, particularly correspondence-finding and goal management. 

In contrast to the 33 observations described above, there were four other trials in 
which subjects stated two rules together. In three of these cases, the time interval 
preceding the statement of the two rules together was approximately twice the time interval 
preceding the statement of single rules. We interpret this to mean that even when two 
rules are stated together, they may have still been induced serially, although we cannot rule 
out parallel processing of two rules at a slower rate on these four occasions. 

The assertion that the rules are induced one at a time must be qualified, to allow for 
the possibility that a constant in a row rule might be processed on the same iteration as 
another rule. Most of the problems in the subset analyzed above contained a constant in a 
row rule, but there was no systematic difference discernible in this small sample between 
those problems that did or did not contain a constant rule. {Recall that a linear regression 
accounted for more of the variance among the mean error rates of problems if the count 
rules excluded any constant rule). Moreover, a constant in a row rule was verbalized far 
less often than the other types of rules. The structure of the stimulus set does not permit 
us to draw strong conclusions about the way the constant in a row rule was processed. 

BETTERAVEN is similar to the human subjects in inducing one rule at a time, in 
that there is a separation between the times at which the rules in a problem are induced. 
On average, there are 23 CAPS cycles {with a range of 22-24) between the time of 
inducing successive rule tokens. However, BETTERAVEN is unlike the students in several 
ways. First, the time between inducing rules is not affected by the number of rules {i.e. the 
amount of variation) in a problem: the 23 cycle interval applies equally to problems with 
two rule tokens and those with three rule tokens. By contrast, human subjects take longer 
to state a rule in problems with three rule tokens than in problems with two rule tokens, 
as shown in Figure 11. This difference suggests that BETTERAVEN*s goal management is 
too efficient, relative to the human subjects. BETTERAVEN also differs from the students 
in its non-parallel induction of a constant rule (the 23 cycle time between rules disregards 
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SERIAL POSITION OF RULE IN VERBAL PROTOCOL 



Figure 11. The elapsed time from the begiiuiing of the trial to the verba! description of 
each of the rules in a problem. 
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any induction of constant rules). In this respect. BETTERAVEN seems less efficient than 
the human subjects, who may be able to induce a constant rule in parallel with another 
rule. 

Inducing one nile at a time: Eye fixation patterns. Another way to demonstrate that 
the rules are induced one at a time is to compare the eye-fixation performance on problems 
containing increasing numbers of rules, looking for evidence of re-iterations for problems 
with different numbers of rules. One of the most notable properties of the visual scan was 
its row-wise organization, consisting of repeated scans of the entries in a row. There was a 
strong tendency to begin with a scan of the top row and to proceed downward to 
horizontally scan each of the other two rows, with only occasional looks back to a 
previously scanned row. (This description applies particularly well to problems involving 
quantitative pairwise progression rules or addition or subtraction rules, and slightly less well 
to the problems in the subset that is being analyzed here, involving multiple distribution-of- 
three-values rules. In the latter problems, subjects also used a row organization, but they 
sometimes looked back to previously scanned rows). So it is reasonable to ask whether 
there is a dependence between the number of scans through the rows and the number of 
rules. 

The data indicate that in general, the number of times that subjects visually scanned 
a row (or a column, or occasionally, a diagonal) increased with the number of rules in the 
problem. A scan of a row was defined as any uninterrupted sequence of gazes on all three 
of the entries in that row. allowing re-fixation of any of the entries (and was similarly 
defined for a column scan)^^. The analysis showed that as the number of rule tokens in a 
problem increases from 1 to 2 to 3. the number of row scans increases from 7.2 to 11 to 
25. as shown in the upper panel of Figure 12. It is likely that during the multiple scans 
associated with each rule, the rule is being induced and verified. 



Insert Figure 12 -row scans and pairwise scans 

Incremental processing in inducing a rule. There are many small steps in inducing 
each rule. For example, in a problem containing a quantitative pairwise progression rule, 
BETTERAVEN can induce the rule in a tentative form after a pairwise comparison between 
the entries in the first two columns in the row. Then the second and third columns can 
be compared and a tentative rule induced, followed by a higher-order comparison that 
verifies or disconfirms the correctness of the tentative rules. In the case of 
disconfirmations, all of the preceding processes must be re-executed» generating additional 
pairwise comparisons. Thus, there are ra-iterative cycles of encoding stimulus properties, 
comparing properties between entries, inducing a rule, and verifying the rule's adequacy. 

As the number of rules increases, so should the number of pairwise similarities and 
differences to be encoded, and consequently, the number of pairwise comparisons. The eye 
fixation data provide clear evidence supporting this prediction. A pairwise scan was defined 
r as any uninterrupted sequence of at least three gazes alternating between any two entries, 

excluding those that were paiL of a row or column scan because they had already been 
included in the row-scanning measures described above. Consistent with the theoretical 
prediction, as the number of rule tokens in a problem increased from 1 to 2 to 3. the 
mean number of pairwise scans (of any length) increased from 2.3 instances to 6 to 16.2, 
as shown in the upper panel of Figure 12. 

We can also determine whether the difficulty of making a pairwise comparison (as 
indicated by the sequence length of a pairwise scan) also increases in the presence of 
additional variation between the entries (as indicated by the number of rules). As shown in 
the lower panel of Figure 12, the number of rules in the problem had no effect on the 
mean sequence length of the pairwise scans. Thus, the pairwise scans may reflect some 
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NUMBER OF RULE TOKENS 

Figure 12. The upper panel shows that the number of row scans and pairwise scans 
increases with the number of rule tokens in the problem. The lower panel shows 
that length of the pairwise scans (i.e. the number of alternating gazes between a 
pair of entries) is unaffected by the number of rule tokens. 
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primitive comparison process that pertains to the induction of a single rule token and is 
uninfluenced by the presence of additional variation between the entries. This result is 
consistent with a theory that says that difficult problems are dealt with incrementally, by 
decomposing the solution inco simple subprocesses. So some subproceoses, like the 
comparison of attributes of two elements, should remain simple in the face of complexity 
(as shown in the lower panel of Figure 12), even as other performance measures do show 
complexity effects (as shown in the upper panel). The decomposition implied by the various 
forms of incremental processing observed here is probably a common way of dealing with 
complexity. 

Lfmftatfons of the model 

At both the micro and macro levels, FAIRAVEN and BETTERAVEN perform 
comparably to the college students that they were intended to model. They solve 
approximately the same subsets of problems as the corresponding students, and they induce 
similar sets of rules. Also, the simulations resemble the students in their reliance on 
pairwise comparisons and in their sequential induction of the rules. The simulations are 
both sufficient and plausible descriptions of the organization of processes needed to solve 
these types of problems. The commonalities o^ the two programs, namely, the incremental, 
re-iterative processing, express some of the fundamental characteristics of problem solving. 
The differences between the programs, namely, the nature of the goal management and 
abstraction, express the main differences among the individuals with respect to the 
processing tapped by this task. Although the simulations match the human data along 
many dimensions of performance, there are also differences. In this section, we address 
four such differences and their possible relation to individual differences in analytic 
intelligence. 

Perhaps the most obvious difference between the simulations and the human 
performance is that the simulations lack the perceptual capabilities to visually encode the 
problems. However, as we argued earlier, this does not compromise our analysis of the 
nature of individual differences because numerous psychometric studies suggest that the 
visual encoding processes are not sources of individual differences in the Raven test. This 
is not to say that visual encoding and visual parsing processes do not contribute to the 
Raven test's difficulty, but only that such processes: are not a primary source of individual 
differences. In addition, the success of the simulation models suggests that the strictly 
visual quality of the problems is not an important source of individual differences: analogous 
problems in other modalities containing haptic or verbal stimuli would be expected to 
similarly tax goal management and abstraction. 

A second di .rence is that the simulations, unlike the students, don't read the 
instructions and organize their processes to solve the problems. Although this mobilization 
of processes is clearly an important part of the task and an important part of intelligence, 
it is an unlikely source of individual differences for this population. All of the college 
students could perform this task sufficiently well to solve the easier, quantitative pairwise 
comparison problems. Moreover, even though the meta-processes that assemble and 
organize the processes lie outside the scope of the current simulation, they could be 
incorporated without fundamentally altering the programs or their architecture (see, for 
example. Williams, 1972). . 

A third feature that might appear to differentiate the simulations from human 
subjects is the difference betv/een rule induction and rule recognition. FAIRAVEN and 
BETTERAVEN are given a set of possible rules and they only have to recognize which 
ones are operating in a given problem, rather than induce the rules "from scratch." 
However, with the notable exception of the distribution-of-two-values rule, the other rules are 
common forms of variation that were correctly verbally described by all subjects in some 
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problems. Hence, the individual differences were not in the knowledge of a particular rule 
so much as 'In recognizing it among other variation in problems with multiple rule tokens. 
By contrast, knowledge of the d;stribution-of-two-values rule did appear to be a source of 
individual differences. We account for its unique status in terms of its abstractness and 
unfamiliarity. In fact, we express the better abstraction capabilities of BETTE RAVEN both 
in terms of its ability to handle a larger set of patterns of differences and in its explicit 
knowledge of this rule. Thus, the difference between the two simulations expresses one 
sense in which knowledge of the rules distinguishes among individuals. On the other hand, 
BETTERAVEN does not have the generative capability of inducing all of the various types 
of abstract rules that one might encounter in these types of tasks: in this sense, it falls 
far short of representing the full repertoire of human induction abilities. 

A fourth limitation of the models is that they are based on a sample of college 
students who represent the upper end of the distribution of Raven scores and so the 
theoretical analysis cannot be assumed to generalize throughcut the distribution. We would 
argue, however, that the characteristics which differentiate college students, namely, goal 
management and abstraction, probably continue to characterize individual difference' 
throughout the population. But there is also evidence that low-scoring subjects sometimes 
use very different processes on the Raven test, which could obscure the relationship 
between Raven test performance and working memory for such individuals. For example, 
as mentioned previously, low-scoring subjects rely more on a strategy of eliminating some of 
the response alternatives, fixating the alternatives much sooner than high-scoring subjects 
IBethell-Fox, Lohman & Snow. 1984: Dillon & Stevenson-Hicks. 1981). Moreover, the types 
of errors made by low-scoring adults frequently differ from those made by high-scoring 
subjects (Forbes. 1964) and may reflect less analysis of the problem. 

If such extraneous processes are decreased and low-scoring subjects are trained to use 
the analytic strategies of high-scoring subjects, the validity of the Raven test increases. 
The study, with 425 Navy recruits, found that for low-scoring subjects, the correlation 
between the Raven test and a wide-ranging aptitude battery increased significantly (from .08 
to .43 J when the Raven problems were presented in a training program that v/as designed 
to reduce non-analytic strategies (Larson. Alderton & Kaupp. 1990). The training did not 
alter the correlation between the Raven test and the aptitude battery for subjects in the 
upper half of the distribution. The fa^rt that the performance of the trained low-scoring and 
all of the high-scoring subjects correlated with the same aptitude battery suggests that after 
training, the Raven test drew on similar processes for each group. Thus, it is plausible to 
suppose that the current model could be generalized to account for the performance of 
subjects in the lower half of the distribution if they are given training to minimize the 
influence of extraneous processes. 



Part IV: COGNITIVE PROCESSES AND HUMAN INTELLIGENCE 

This section discusses the implications of the model for analytic intelligence. The 
first sections examine how abstraction and goal management are realized in other cognitive 
tasks. These sections focus primarily on goal management rather than abstraction, in part 
because abstraction is implicitly or explicitly incorporated into many theories of analytic 
intelligence, whereas goal management has received less attention. The final section 
examines what the Raven simulations suggest about processes that are common across 
people and across different domains. 
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Abstraction 



Most intuitive conceptions of inteiiigence include an ability to think aiDstractly, and 
certainly solving the Raven problems involves processes that deserve that label. Abstract 
reasoning consists of the construction of representations that are only loosely tied to 
perceptual inputs, and instead are more dependent on high-level interpretations of inputs 
that provide a generalization over space and time. In the Raven test, more difficult 
problems tended to involve more abstract rules than the less difficult problems. 
(Interestingly, it intuitively seems as though the level of abstraction of even the mo"t 
difficult rule, distribution-of-two-values. is not particularly great compared to the abstractions 
that are taught and acquired in various academic domains, such as physics or political 
science). The level of abstraction also appears to differentiate the tests intended for 
children from those intended for adults. For example, one characterization of the easy 
problems found in the practice items of Set I and in the Colored Progressive Matrices is 
that the solutions are closely tied to the perceptual format of the problem and, 
consequently, can be solved by perceptual processes (Hunt. 1974). By contrast, the 
problems that require analysis, including most of the problems in Set II of the Advanced 
Progressive Matrices, are not as closely tied to the perceptual format and require a more 
abstract characterization in terms of dimensions and attributes. 

Abstract reasoning has been a component of most formal theories of ihtelligence, 
including those of traditional psychometricians, such as Thurstone (1938), end more recent 
individual difference researchers (Sternberg. 1985). Also, Piaget's theory of intelligence 
characterizes childhood intellectual development as the progression from the concrete to the 
symbolic and abstract. We can now see precisely where the Raven test requires abstraction 
and how people differ in their ability to reason at various levels of abstraction in the Raven 
problems. 

Goal management 

One of the main distinctions between higher-scoring subjects and lower-scoring subjects 
^ the ability of the better subjects to successtuily generate and manage their problem- 
solving goals in working memory. In this view, a key component of analytic intelligence is 
goal management, the process of spawning subgoals from goals, and then tracking the 
ensuing successful and unsuccessful pursuits of the suDg-^ Is on the paKi to satisfying 
higher-level goals. Goal management enables the problem-solver to construct a stable 
intermediate form of knowledge about his progress (Simon, 1969). In Simon's words "... 
complex systems will evolve from simple systems much more rapidly if there are stable 
intermediate forms than if there are not. The resulting complex forms in the former case 
will be hierarchic ..." (p. 98). The creation and storage of subgoals and their mterrelations 
permit a person to pursue tentative solution paths while preserving any previous progress 
he has made. The decomposition of the complexity in the Raven test and many other 
problems consists of the recursive creation of solvable subproblems. The benefit of the 
decomposition is that an incremental iterative attack can then be applied to the simplified 
subproblems. A failure in one subgoal need not jeopardize previous subgoals that were 
successfully attained. Moreover, the record of failed subgoals minimizes fruitless re-iteration 
along previously failed paths. But the cost of creating embedded subproblems, each with 
their own subgoals. is that they require the management of a hierarchy of goals. 

Goal management probably interacts with another determinant of problem difficulty, 
namely the novelty of the problem. A novel task may require the organization of high-level 
goals, whereas the goals in. a routine task have already been used to compile a set of 
procedures to satisfy them, and the behavior can be much more stimulus-driven (Anderson, 
1987). Th'i use or organization of goals is a strategic level of thought, possibly involving 
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meta-cognition. or requiring reflection. In the BETTERAVEN model, additional goal 
management mechanisms like selection among multiple goals, a goal monitor, and backup 
from goals, had to be .'ncluded to soive the more difficult problems. However, if people 
had extensive practice or instruction on Raven problems, the goal management would 
become routine, and the problems thereby easier. Instruction of sixth graders in the use of 
the type of general str itegy used by FAIRAVEN and BETTERAVEN improves their scores 
on Set I of the Raven test (Lawson & Kirby. 1981). 

This analysis of the source of individual differences in Raven test should apply to 
other complex cognitive tasks as well. The generality of the analysis is supported by 
Experiment 2. which found a large correlation between the I^ven test and the execution of 
a Tower of Hanoi puzzle strategy that places a large burden on goal generation and goal 
management. The present analysis is also consistent with the high correlations among 
complex reasoning tasks with diverse content, such as the data cited in the introduction 
(Snow. Kyllonen & Marshalek. 1984: Marshalek, Lohman & Snow. 1983). These researchers 
and others have suggested that the correlations among reasoning tasks may reflect higher- 
level processes vhat are shared, such as executive assembly and control processes (see also 
Carroll. 1976: Sternberg. 1985). The contribution of the current analysis is to specify these 
higher-level processes and instantiate them in the context of a widely used and complex 
psychometric test. 

The RAVEN test's relation to other analogical reasoning tasks 

The analogical nature of the Raven problems suggests that the Raven processing 
models should bear some resemblance to other models of analogical reasoning. One of the 
earliest such AI projects was Evans' ANALOGY program (1968). which solved geometric 
analogies of the form (A:B :: C:(five choices]). Evans' program had three main steps. 
The program computed the spatial transformation that would transform A into B using 
specific knowledge of analytic geometry. It then determined the transformation necessary 
to transform C into each of the five possible answers. Finally, it compared and identified 
which solution transformation was most similar to that for transforming A into B, and 
returned the best choice. A major contribution of ANALOGY was that it specified the 
content of the relations and processes that were sufficient to solve problems from the 
American Council on Education examination. Although ANALOGY was not initially 
intended to account for human performance. Mulholland, Pellegrino & Glaser (1980) found 
aspects of the model did account for the pattern of response times and errors in solving 2 
X 2 geometric analogies. Both errors and response limes increased witn the number of 
processing operations, which Mulholland et al. attributiJ to the increased burden on working 
memory incurred by tracking elements and transformations. Thus, much simpler analogical 
reasoning tasks can reflect working memory constraints.^^ 

Analogical reasoning in the context of simple 2x2 matrices has also been analyzed 
from the perspective of individual differences. The theoretical issue has been whether 
individual differences in the speed of specific processes (such as inferent*ng, mapping, 
verifying) account for individual differences in more complex induction tasks, like the Raven 
test. For example, Sternberg and Gardner (1983) found that a speed measure based on a 
variety of inference processes used in simple analogical and induction tasks was correlated 
with psychometrically-assessed reasoning ability. However, several other studies have failed 
to find significant correlations between the speed of specific inference processes and 
performance in a more complex reasoning task (Mulholland et al. 1980: Sternberg. 1977). 
The overall pattern of results suggests that the speed of any specific inference process is 
unlikely to be a major determinant of goal management. This conclusion is also supported 
by the high correlation between the Raven test and the Tower of Hanoi puzzle, a task that 
required very little induction. levertheless, some degree of uiciency in the more task- 
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specific processes may be a necessar} (if not sufficient) condition to free up working-memory 
resources for goal generation and management. The analysis of reasoning in simple 
analogies illuminates the task-specific inference processes, but is unlikely to account for the 
individual differences in the more complex reasoning tasks. 

WliQt aspects of intelligence ore common to everyone? 

The Raven test grew out of a scientific tradition that emphasizes the analysis of 
intelligence through the study of individual differences. The theoretical goal of the 
psychometric or differential approach (in contrast to its methodological reliance on factor 
analysis) is to account for individual performance, not simply some statistical average of 
group performance. The negative consequence of this approach is that it can conceptually 
and empirically exclude those processes that are necessary for intelligent behavior, but are 
common to all people, and hence not the source of significant differences among individuals. 
Computational models such as the Raven simulations must include both the processes that 
are common across individuals and those chat are sources of sigfnificant differences. 
Consequently, the models provide insights into some of the important aspects of intelligence, 
such as the incremental and re-iterative nature of reasoning. 

Cognitive accounts of other kinds of ability, such as models of sp tiai ability (e.g. Just 
& Carpenter. 1985: Kosslyn, 1980: Shepard & Cooper. 1982) and language ability (e.g. Just 
& Carpenter, 1987: van Dijk & Kintsch, 1983) also contribute to the characterizations of 
intelligence. Neweli has argued that psychology is sufficiently mature to warrant the 
construction of unified theories of cognition that encompass all of the kinds of thinking 
included in intelligence (as well as some others), and offers the SOAP nodel as his 
candidate (Newell. 1987). Although the collection of models for diverse tasks that we have 
developed is far more modest in scope, all of the models have been expressed in the same 
theoretical language (the CAPS production system), making the commonalities and 
differences relatively discernible. All of these models share a production-system control 
structure, a capacity for both seriality and parallelism, a representational scheme that 
permits different activation levels, and an information accumulation function (effectively, an 
activation integrator). One interesting difference among tasks is that some types of 
processes are easy to simulate with parallelism, while others are not (easy in the sense that 
the models can perform the task and still retain essential human performance 
characteristics). The processes that seem to operate well in parallel in the simulation models 
are highly practiced processes and lower-level perceptual processes. The simulation of 
higher-level conceptual processes is accomplished more easily with seriality, unless extensive 
increments to goal management are included. 

What the theory postulates about the commonalities of different people and different 
tasks reflects some of the observed performance commonalities. Many of the performance 
commont^ ities occur at the microstructure of the processing, which is revealed in the eye 
fixation patterns. The time scale of this analysis is about 300-700 milliseconds per gaze. 
Such processes are too fast for awareness or for including in a verbal report. The eye 
fixation analysis reveals iterations through small units of processing; the task is "omposed 
into manageable units of processing, each governed by a subgoal. Then, the subgoals are 
attacked one at a time. The problem decomposition and subgoaling reflect how j}eople 
handle complexity beyond their existing operators in a number of domains, including text 
comprehension, spatial processing and problem solving. For example, in a mental rotation 
task, subjects decomposed a cube into smaller units that they then rotated one unit at a 
time (Just & Carpenter, 1985). Similarly, in the Raven test, even the simplest types of 
figural analogies were decomposed and incrementally processed through a sequence of 
pairwise comparisons. This segmentation appears to be an inherent part of problem-solving, 
and a facet of thinking that is common across domains in various tasks requiring analytic 
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intelligence. 

Thus, whaf one infellifrpnce test measures, according to the current theory, is the 
common ability to decompose problems into manageable segm.ents and iterate through them., 
the differential ability to manage the hierarchy of goals and subgoals generated by this 
problem decomposition, and the differential ability to form higher-level abstractions. 
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Appendix A 
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Appendix A (continued) 
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^ Problems 11-18 and 11-19 were not classifiable by our taxonomy, 
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Footnotes 

1. To protect the security of the Raven problems, none of the act\ia! problems from the 
test are depicted here or elsewhere in this paper. Instead, the test problems are illustrated 
with isomorphs that use the same rules but different figural elements and attributes. The 
actual problems that were presented to the subjects are referred to by their number in the 
test, which can be consulted by readers. 

2. This analysis is row oriented. In most problems, the rule types are the same regardless 
of whether a row or column organization is applied: in our experiments, we found most 
subjects analyzed the problems by rows. Two of the problems on the test were 
unclassifiable within our taxonomy because the nature of their rules differed from all others. 

3. This taxonomy finds some converging support from an analysis of the relations used in 
figural analogies (both 2x2 and 3x3 matrices) from 166 intelligence tests (Jacobs & 
Vande venter. 1972). Jacobs and Vandeventer found that 12 relations accounted for many of 
the analogical problems. Five of their relations are closely related to rules we found in the 
Raven: addition and added element (addition/subtraction), elements of a set (distribution of 
3 values), unique addition (distribution of 2 values) and identity (constant in a row). Some 
of the remaining relations, such as numerical series and movement in a plane, map onto 
our quantitative progression rule. The Jacobs and Vandeventer analysis suggests that 
relatively few relations are needed to describe the visual analogies in a large number of 
such tests. 

4. A control study showed that the deviations from conventional administration did not 
alter the basic processing in Experiment la. A separate group of 19 college students was 
given the test without recording eye fixations or requiring concurrent verbal protocols. This 
control group produced the same pattern of errors (r(25) = .93) and response times (r(25) 

= .89) as the subjects in Experiment la. for the 27 problems from Set II that both groups 
were presented. Furthermore, the error rate was slightly higher in Experiment la (33%) 
than in the control group (25%). demonstrating that the lower rate in Experiment la (and 
lb) than in Forbes' sample is likely due to our sample being comprised exclusively of 
college students, rather than to increased accuracy in Experiment la because of eye fixation 
recording or generation of verbal protocols. 

5. As expected, subjects with low Raven test scores (between 12-17 points) made more 
errors than other groups as the number of subgoals to be generated increased (their error 
rate was .13, .66 and .59, for moves involving the generation of 0, 1, and 2 or more 
subgoals, respectively). Their data was better fit by a model that assumed a more limited 
capacity working memory and more goal generation, even for the smallest sub-p3a*amid. 
Also, the lowest-scoring subjects we»'e mor^. likely to make multiple errors at a single move 
than other subject groups. The lowest-scoring subjects made an average of 17.2 while 
solving the 5 puzzles, compared to only 4 such errors by the next lowest-scoring group 
(those with Raven test scores between 20-24 points). Multiple errors at a move suggest 
considerable difficulty in executing the strategy because only two errors were possible at 
each move and one of the two consisted of retracting the previous correct move. Later in 
the paper we will discuss evidence that subjects in the bottom half of the distribution are 
more influenced than those in the upper half by extraneous processes while performing the 
Raven test as well. 

6. This list also omits another rule, so obvious as to be overlooked in our task analyses 
and the subjects' verbal reports, but not by the simulation model. The overlooked rule that 
is a constant everywhere rule. An example of this rule can be found in the probUm shown 
in Figure 4b. in which every entry contains a diamond outline. In this particular example. 
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the constant everywhere rule does not discriminate among the response alternatives because 
they all contain a diamond outline. Both FAIRAVEN and BETTERAVEN used a constant 
everywhere rule where applicable, but we wiP not discuss it further because of the minor 
role it plays in problem difficulty and individual differences. 

7. This estimate is based on problems containing either two or three tokens of the 
distribution-of-three-values rule. On 20% of the correct trials and 59% of the error trials, 
the verbal reports contained no evidence of the subject having noticed at least one of the 
critical attributes: that is, neither the rule itself nor any attribute or value associated with 
that rule was mentioned. We assume that 20% is an estimate of how often the verbal 
reports don't reflect an encoded attribute. Consequently, we can estimate that 80% (the 
complement of 20%) of the 59% of the error trials with incomplete verbal reports (or 
approximately 50%) may be attributed to incomplete encoding or analysis, not just an 
omission in the verbal report. 

8. The 12 subjects fully described a classifiable rule in only 51% of the 708 (12x59) cases. 
The agreement among those subjects who described a rule for a given attribute was very 
high (only 7% of the 708 cases were disagreements with a plurality): however, in 37% of 
the cases, the rule descriptions were absent or incomplete, so the pluralities are sometimes 
small. 

9. The reason for BETTERAVEN not v.sing figure addition or subtraction in these cases is 
that they required a more general form of addition or subtraction than BETTERAVEN's 
could handle. In one problem, the horizontal position of the figural element that was being 
subtracted was also being changed (operated on by another rule) from one column to the 
next. In the other problem, some figural elements had to be subtracted in one row, but 
added in another row, so both types of variation would have to have been recognized as a 
form of a general figure addition/subtraction. Because BETTERAVEN's addition and 
subtraction rules were too specific to apply to these two situations* its distribution-of-two- 
values rule applied instead. The human subjects' ability to use an addition or subtraction 
rule testifies to the greater generality of their version of the rule compared to 
BETTERAVEN. 

10. The gaze analyses of the problems containing different numbers of tokens of 
distribution-of-three-values rules was applied to the protocols of 6 of 7 scorable subjects (who 
happened to be the higher-scoring subjects), eliminating those trials on which subjects made 
an error, or when the eye fixation data were lost due to measurement noise. The seventh 
scorable eye fixation protocol was excluded because it came from the lowest scoring subject, 
who had too few correct trials to contribute. The data in Figure 12 are from 10 
obsen/ations of problems Set I-#7, Set II-#17, each of which contains 1 rule, 25 
observations of problems Set I-#8, 9, Set 11-^1, 13 and 27. which contain 2 rule tokens, 
and 5 observations of problems Set 11-/^29 and 34. which contain 3 rule tokens. 

11. Although there has been a large amount of subsequent artificial intelligence research 
on analogical reasoning, most of the work has focused on knowledge representation, 
knowledge retrieval and knowledge utilization rather than inferential and computational 
processes (see the summary by Hall. 1989). Analogical reasoning is viewed as a 
bootstrapping process to promote learning and the application of old information in new 
domains (Becker, 1969: McDermott. 1979). In psychology, this view of analogical reasoning 
has resulted in research that examines the conditions under which subjects recognize 
analogous problem solutions (Gick & Holyoak, 1983) and the contribution of analogical 
reasoning to learning (Gentner, 1983). 
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